Monday, January 18, 2010

Working with Massive Files

I see this quite often: A very large worksheet laid out in a massive table, containing thousands of records and multiple, complex formulas.  The miracle of Excel makes this sort of number-crunching possible, but when going large, things can get sketchy.

A recent finance project I worked on tracked 2,500 corporations and calculated a number of statistics about them using several different estimation models in concert. There were 140 data columns, of which about 100 were calculated using lengthy, nested formulas.

As I constructed the solution, I noticed my file size growing rapidly.  Soon the file was up to 24MB. This is much too large to send by email, which complicates my delivery and backup schedules slightly. But more importantly, it makes Excel slow and "sandy", which is what I call it when the hourglass starts showing up during routine operations.

On my model I set the column width for a range of cells, and thought I had missed the button because for several seconds nothing happened. As I added to the model, edits to formulas started taking more and more time to recalculate, and save time for the workbook became a factor.  A few seconds here and there may not seem important, but the impact on your daily productivity can be serious. Its like trying to drive a truck that has a loose steering wheel.

The following graph shows how productivity can start to fall off steeply as soon as recalculation time starts to exceed 1 second, (based on an average of 4 seconds between each user-performed operation.)



You can throw hardware at the problem (ie: get a faster computer), but there are cheaper and more immediate options available to keep your workbooks snappy and your deliveries on schedule.

1. Turn off Automatic Calculation.
Sort of obvious - this can make a huge difference, and is achieved via the Tools / Options dialog -> Calculation tab.  You will need to hit F9 frequently to recalculate, but at least you get to choose when it happens.  This might not help if your bottleneck is caused by volume rather than formula complexity.

2. Commit Formulas
In situations where the required "result" of a model involves a number of interim calculations, one generally uses calculated columns in the data to represent the various steps.  If your source data is not likely to change, you can convert formulas into actuals using copy / paste special -> values.  Only do this when you're satisfied that the formulas are correct and the inputs won't change.

If those formulas were being fed by a bunch of data columns, those too can be deleted, further lightening the file.  Hint: keep at least one row of data using the original formulas, so you can edit and re-copy them down the table if required.  If you delete data columns, be sure to keep back-up versions of your workbook.

3. Work in Abstract
If your data set consists of thousands of records, each fundamentally the same in structure, why not delete 99% of them and work on the sub-set?  Providing you design your formulas to be range-copyable, it shouldn't matter if your sheet has 20 rows or 20,000.  Keep it lean and snappy while you assemble the overall model and tweak the formatting.  When it's perfect, bring in the full data set, copy down your formulas, and save your final version.

4. Zip it up
When it comes time to send the final workbook to the end user, and the file is 2 MB or larger, the polite thing to do is compress the xls file and send it as a ZIP attachment.  In File Manager, right-click the workbook and choose Send To-> Compressed (zipped) Folder.  A 80% reduction in file size can usually be achieved, making it possible for the recipient to actually receive your work.  An added bonus for those who really care: ZIP files can be password-protected, which is more secure than the built-in workbook password scheme.

Simple tactics such as these will make you the star of the shop, and the one they call when those massive workbooks needs wrangling.  Less [hourglass] is more.

1 comment:

  1. Great suggestions for work in abstract and copy/paste special|values for all but one formula! I will keep this in mind for future use, thanks!

    Steve Wynne
    http://automatenow.blogspot.com/

    ReplyDelete