Wednesday, January 27, 2010

Setting Traps

I believe that Excel is the ultimate expression of software as canvas - infinitely configurable and adaptable to any problem.  When computers were first invented, those scientists could not have predicted how superbly we manipulate the microchip with software such as  Microsoft's Excel.  That said, it's all too easy to send the whole thing off the rails.  Your code must be able to handle those unexpected user interactions, data anomalies and logic bombs you allow in.  I'm talking about VBA run-time errors.

An error condition arises in the VBA compiler when there is an impossible situation that stops execution of your code.  This often occurs when you attempt to put the wrong type of data in a variable, fail to declare a variable, or exceed a variable's inherent limits.  Trying to call a routine that doesn't exist, dividing by zero, and file system problems are also common causes of run-time errors.

Errors can also happen outside the compiler.  In other words, VBA doesn't actually "choke" on anything.  The code just runs and dutifully produces some results that just happen to be completely wrong.  These sorts of errors are much more difficult to detect, and are often the result of programming errors or bugs in your code.  More on these in a future posting.  Today we'll stick to trappable errors.

For example, say you have a routine that reads through a table of numbers and adds each one to a running total.  If you are only expecting numbers, what happens when the next cell contains a word?  Or an Excel error value?  This may have worked fine when you wrote it, but as the users modified the underlying spreadsheets, things changed.  Now the compiler hits a snag and triggers an error condition.

The worst thing to do is have your code fail silently, with no hint that anything wrong happened.  If it's going to crash, please let us know at least that.  A clue about which function took the hit, and how, goes a long way towards diagnosing the cause.  This information can be sent to the debug window, or show up in an alert box that your spreadsheet's operator will see.

Depending on who's running the macro, you might not want to spew out a bunch of technical detail in a dialog - that can be off-putting.  Just say "Sorry... there's a problem loading the file. Try again or Contact support..." or something friendly like that.

To handle errors superbly, they must be trapped using a combination of the On Error statement, along with an error handling section in your code. The Err object is also helpful because it "knows" what happened.

Every complex piece of code I write has, more or less, the following structure - my template for a function or sub-routine:

Sub
On Error Goto Fail ' the trap!
' - code -
Exit Sub
Fail:
' - error handler -
End Sub

Here, the word Fail refers to the code Label, which can be just about anything you like.  The On Error Goto Fail tells the compiler to jump down to the error handler and run whatever code it finds there whenever an error condition arises.


The Err object, at this stage, will contain much information about the error - most importantly the error code and description.  If you're anticipating certain types of errors, you can check the error code and handle these differently.  For example, if you are trying to read a file from disc, a specific error code might indicate that the disc is not available, or the file is not found, or the path is not found, etc.  Each code could be tested for, and custom error messages generated to help the user pinpoint exactly what went wrong and how to fix it.


Here's a routine to spit out the full set of VBA error codes.  They go up to 65,535, so be prepared for a long wait if you run this!  It appears that most codes are either un-used, or the dreaded and meaningless "Object-Defined" type.

Sub ErrorCodeDump()
 Dim idx As Long

 On Error GoTo Fail
 For idx = 1 To 65535 'watch out!
    ActiveSheet.Cells(idx, 1) = idx
    Err.Raise idx
 Next
 Exit Sub
Fail:
 ActiveSheet.Cells(idx, 2) = Err.Description
 Resume Next
End Sub

For lower-power applications, I usually just print out the Err.Description, along with the name of the function or procedure, to the debug window.  Then I can run the code, observe the debug window, and see what happened.  It all depends on what you're deploying and how it will be supported.

 Note that Error Trapping is controlled at the Excel level.  Check the setting in the options dialog - from the VBA Editor, Tools -> Options, General Tab: Select Break on All Errors to have Excel "barf" every time an error condition is raised (during early development), or, for a more controlled user experience, choose Break on Unhandled Errors.

An application that handles errors elegantly and professionally will go a long way towards winning your customer's undying loyalty, admiration and respect.  Next time: more error trapping techniques such as raising your own errors from the seed, intentionally causing errors, and other erroneous logic.

Monday, January 18, 2010

Working with Massive Files

I see this quite often: A very large worksheet laid out in a massive table, containing thousands of records and multiple, complex formulas.  The miracle of Excel makes this sort of number-crunching possible, but when going large, things can get sketchy.

A recent finance project I worked on tracked 2,500 corporations and calculated a number of statistics about them using several different estimation models in concert. There were 140 data columns, of which about 100 were calculated using lengthy, nested formulas.

As I constructed the solution, I noticed my file size growing rapidly.  Soon the file was up to 24MB. This is much too large to send by email, which complicates my delivery and backup schedules slightly. But more importantly, it makes Excel slow and "sandy", which is what I call it when the hourglass starts showing up during routine operations.

On my model I set the column width for a range of cells, and thought I had missed the button because for several seconds nothing happened. As I added to the model, edits to formulas started taking more and more time to recalculate, and save time for the workbook became a factor.  A few seconds here and there may not seem important, but the impact on your daily productivity can be serious. Its like trying to drive a truck that has a loose steering wheel.

The following graph shows how productivity can start to fall off steeply as soon as recalculation time starts to exceed 1 second, (based on an average of 4 seconds between each user-performed operation.)



You can throw hardware at the problem (ie: get a faster computer), but there are cheaper and more immediate options available to keep your workbooks snappy and your deliveries on schedule.

1. Turn off Automatic Calculation.
Sort of obvious - this can make a huge difference, and is achieved via the Tools / Options dialog -> Calculation tab.  You will need to hit F9 frequently to recalculate, but at least you get to choose when it happens.  This might not help if your bottleneck is caused by volume rather than formula complexity.

2. Commit Formulas
In situations where the required "result" of a model involves a number of interim calculations, one generally uses calculated columns in the data to represent the various steps.  If your source data is not likely to change, you can convert formulas into actuals using copy / paste special -> values.  Only do this when you're satisfied that the formulas are correct and the inputs won't change.

If those formulas were being fed by a bunch of data columns, those too can be deleted, further lightening the file.  Hint: keep at least one row of data using the original formulas, so you can edit and re-copy them down the table if required.  If you delete data columns, be sure to keep back-up versions of your workbook.

3. Work in Abstract
If your data set consists of thousands of records, each fundamentally the same in structure, why not delete 99% of them and work on the sub-set?  Providing you design your formulas to be range-copyable, it shouldn't matter if your sheet has 20 rows or 20,000.  Keep it lean and snappy while you assemble the overall model and tweak the formatting.  When it's perfect, bring in the full data set, copy down your formulas, and save your final version.

4. Zip it up
When it comes time to send the final workbook to the end user, and the file is 2 MB or larger, the polite thing to do is compress the xls file and send it as a ZIP attachment.  In File Manager, right-click the workbook and choose Send To-> Compressed (zipped) Folder.  A 80% reduction in file size can usually be achieved, making it possible for the recipient to actually receive your work.  An added bonus for those who really care: ZIP files can be password-protected, which is more secure than the built-in workbook password scheme.

Simple tactics such as these will make you the star of the shop, and the one they call when those massive workbooks needs wrangling.  Less [hourglass] is more.

Monday, January 11, 2010

Speed Optimization Tips

If you are writing (not recording) macros that contain more than a few dozen lines of code and take more than a few seconds to execute, you can consider yourself a power macro programmer.   In this day of incredibly fast processors and massive amounts of RAM, hardware speed can usually overcome the sluggishness of inefficient or poorly written code.  But competent developers are never satisfied with this attitude, and will invest a little extra time making sure no processor cycles are wasted.

As your coding skills advance, and you take on bigger and bigger automation tasks in VBA, you will eventually get to the point where you need to optimize your code for speed.  This is especially true when your code will be used by other people.  If your users are forced to wait an unreasonable amount of time for a process to run, they may get annoyed, or start clicking around on Excel's menus or buttons to see what's going on.  This usually results in your macro running even slower, or worse: an Excel crash.

Presented here are a few strategies you can use to make sure you code runs quickly and your users are happy with the performance of your macros.

Tracking Execution Time
Say you have a complex Excel model with many specific methods (macros) that need to be run over the course of a user's interaction with the file.  If things appear to be running slowly, the first thing you need to do is figure out where the bottlenecks are.

A simple way to do this is to annotate your code with some debug feedback so you, as developer, can track the performance of the individual macros and functions.  This is done using the Debug.Print directive along with the Timer construct, as in the following code example:

Sub MyMethod()
  Dim tmr As Single
  tmr = Timer
  ... your code here ...
  Debug.Print "MyMethod took " & Timer - tmr & " seconds to run"
End Sub


If you put this code in each of your subroutines, you will quickly be able to see which ones took the longest, and you can focus your optimization efforts there.

Caching
John Carmack, master programmer and one of the founders of ID software (creators of the game Quake) was an early pioneer in the field of 3D rendering.  In the 90's when he was writing Quake, computers were running Pentium 1 processors and had 16 MB or ram on average.  In order for his game to be playable, John had to squeeze every last bit of performance from his code, and his most important strategy for this was Data Caching.  Caching refers to storing the data you need close at hand, so you don't have to go looking for it when you need it.

Think of a worker high on a construction scaffold.  He has his tools and supplies up there with him, so he doesn't have to climb down to the truck and get the next piece each time he does a bit of work.  Your Excel logic can be set up to act the same way - instead of searching for the piece of data you need on every iteration of a loop, find it and store it locally outside the looped code.

For example, this would be slow:


  For X = 1 to 30000
     Cells(x,1) = Range("MyVariable") + X
  Next


But this would be much faster, because we don't need to look up a Range variable 30,000 times:


  MyVar = Range("MyVariable")
  For X = 1 to 30000
     Cells(x,1) = MyVar + X
  Next


Keeping local references to cell values can be a big performance win in VBA, because there is a lot of overhead involved in going to the spreadsheet and extracting a value.  Any time you can remove this sort of lookup from inside a loop it's worth doing.

Screen Refresh and Recalculation
Another extremely helpful way to speed up your code is to turn off screen refresh and, if possible, workbook calculation.  Depending on what your code is doing, you may enjoy a 1,000 times speed improvement when suppressing screen updates while your macro executes.  Just be sure to turn it back on again after your macro is done.


Sub MyMethod()
 'turn off refresh and recalc
  Application.ScreenUpdating = False
  Application.Calculation = xlCalculationManual
  ... your code runs ...
 'turn it back on!
  Application.ScreenUpdating = True
  Application.Calculation = xlCalculationAutomatic
End Sub


Perceived Speed - providing progress feedback
Perhaps you have done everything possible to make sure your code is as snappy and efficient as possible, but it still takes several seconds to execute because of the sheer amount of data being manipulated.  To keep your users from dozing off or getting worried, you should provide some sort of feedback.  An easy way to do this is through the Excel status bar, and this works especially well when giving feedback on the progress of a loop, as in the following example:

  iTotal = 1000
  For idx = 1 to iTotal
     something happens...
    Application.StatusBar = "Running... "&Int((idx / iTotal)*100)&"% Complete"
  Next


Even though the speed of your code won't change, the users will actually think it is running faster because they can see that it's running, and how long it's going to take. Without this, they may wonder if your code has bogged down, and click the Launch button repeatedly - not good.

Conclusion
By taking some time to consider the experience your users are having while running your code, and following some of these strategies, you will be able to produce efficient macros that enhance the productivity of those who rely on your work.

Monday, January 4, 2010

Living Mac, Working Windows

I spent most of my career working on Microsoft Windows-based PC's - programming superb Excel macros, but enduring the endless maintenance, system crashes and sketchy security that all PC owners must put up with.  I envied those with Macintosh laptops and their 6 hour batteries, complete lack of viruses, and beautiful hardware, with just about all the productivity software you could ever want pre-installed!  Unfortunately, the business world of which I was a part uses Excel on Windows.  I could not leave Windows because of the need to maintain file compatibility with my client base.

When Apple announced the Intel-based Macs, and the ability to run Windows through virtualization, I made the jump with enthusiasm.  Suddenly, my personal computing experience transformed from constant discomfort to effortless joy - no exaggeration!  For a while I thought I'd even be able to do my work in Excel for Mac, as the 2003 version supported VBA, but it was not really 100% compatible, and VBA has been removed from the latest Mac Office, to the dismay of switchers everywhere!  (Microsoft claimed that the new Mac Office 2008 required such a complete re-write that it would be impossible to update the VBA engine... Of course, if they had more that 3 programmers on the job it might be feasible, but we all know how much Mr. Balmer and Co. care about supporting the Apple platform.)

But using Parallels, (or VMWare Fusion, which I hear is also excellent) one can run Windows XP along side the usual Macintosh applications.   Now I can develop Excel macros in the latest version of Excel for Windows, knowing that my code will run on my clients' computers, while still enjoying the superior personal computing experience that only Macs can give.  Switching between the two OS's is instantaneous - no reboot is required.  If you run dual monitors, you can even show them both side-by-side.

My computer is the tool of my trade - it must be enjoyable to use every day, or else the trade becomes a painful chore.  Thanks Parallels (and Apple, and MS Office 2003 for Windows), for allowing me to have the best of both worlds.  Instead of quarterly OS re-installs, shoddy shareware, and Outlook viruses, I have Garage Band, bluetooth iPhone tethering, a mag-safe power cord, and a lighted keyboard!  If you have the option, I strongly recommend porting your life (if not your business software) to Mac.