Darling, I shrunk the workbook

I was digging through older Excel projects recently, and realized that while all my recent projects haven’t become smaller in terms of data (if anything, they have become more data-intensive), they all had very small Excel files. Which got me wondering – did my workbooks really shrink, and why? One of the characteristics of my recent projects has been that I have progressively removed most of the calculations from the spreadsheets. Excel is used to store inputs and to display outputs, and computations happen outside, using either VSTO add-ins, or Interop (I realize it is a somewhat devious use of Excel, but I have my reasons for doing this). As a result, the Excel files contain raw data (input data, or output results), some formatting, and no formulas. This made me curious: does using formulas affect the file size significantly? Here are the 2 key conclusions from my quick & dirty experiment:

Cells used and file size

Before looking at the impact of formulas, I wanted to see how file size reacted to “non-formula” cells. For that purpose, I created multiple Excel 2003 workbooks:

CellsVsFileSize

Formulas

Next I started playing with formulas; rather than having 100 x 100 cells filled with the letter “A”, I filled A1 with A, and had all other cells equal the cell on its left.
SimpleFormula

This one surprised me: the file size nearly tripled.

Just for fun, I tried a few variations:

Conclusion

I admit it, I was surprised to see that entering a formula in a cell did matter, while entering more data in a cell doesn’t. The reason I was surprised is that intuitively, I expected workbooks with formulas to take more space in memory (because they have to load both the formula and its value), but not on disk, because I though the worksheets would store and save only the formula (which shouldn’t take more space than a straight string content), and would dynamically refresh the value of the function after the file has been opened. That doesn’t seem to be the case.

Excel Cells have 2 different properties, Cell.Text and Cell.Value2, which return respectively the text contained in the cell, and what that text evaluates too. One possible interpretation of what is going on is that for static cells (no formula), by definition the value equals the text, whereas for formula cells, the 2 differ; Excel probably stores each differently, saving the function AND its value when saving a formula cell into a file.

In any case, this is of very limited practical use. I have seen my share of humongous workbooks, but the file size was never the issue – maintainability and speed of computation were. This and testability are the key reasons why I tend to extract the calculations from the workbook – and I take it as a pleasant side-effect that, as a cherry on the cake, file size will typically be smaller, too!

Comments

Have a comment or a question? Ping me on Twitter, or use the comments section!