Wednesday, February 17, 2010

IF only SumIF could do this...

On a recent project I had the challenge of generating a summary for a large, unsorted, filtered data set.  The model was initially created using the SumIF formula to grab subtotals for a range of cells, based on certain criteria.  Unfortunately this wasn't working, because the summary included rows of data that were not part of the filtered results - in other words: data from hidden rows needed to be ignored.

The Excel veterans out there probably already know about the SubTotal function, which contains a parameter that allows you to include or exclude hidden rows.  This works well as a simple SUM (or count, average, max, min, etc) when working on filtered sets, but does not allow you to include any additional conditions.

Another option would be to add control columns to the data set - something off to the side that evaluates the IF condition, and returns a 1 or 0.  This can then be combined with the actual data using a function called SumProduct.  SumProduct allows you to specify two ranges of numbers, and will sum the product of the items in the first range multiplied by the items in the second range.  Rows that failed the condition would multiply the data by zero, effectively removing them from the total.  This approach, though feasible, required adding a large number of new formulas to the data sheet, and was rejected because the data sheet is being populated regularly by copying and pasting from an external source.  We wanted to keep that sheet lean and simple.

In the end we opted for the power-user's approach: writing a custom function to provide the missing logic.  This is the function: SumIF_X


Function SumIF_X(rngCriteria As Range, _
rngCompare As Range, _
rngValues As Range) As Variant

'new version of the SumIf function
'that ignores hidden rows!

Dim intRow As Long
Dim rng As Range
Dim cTemp As Variant
Dim intTop As Long
Dim strTemp As String

On Error Resume Next

intTop = rngCriteria.Row
strTemp = rngCompare
For Each rng In rngCriteria
  If rng = strTemp Then
    If rng.RowHeight > 0 Then
      cTemp = cTemp + rngValues.Cells(rng.Row - intTop + 1)
    End If
  End If
Next

Set rng = Nothing
SumIF_X = cTemp

End Function


It is designed to mimic the SumIF function, so the parameters are the same: A criteria range, a compare value and a data range.  Using the For Each construct, we cycle through all the cells in the criteria range, and for any that match the compare value (and who's rowheight is not zero,) we add the associated data element to our running total.

It works fine, but there's a potential danger in using this approach: if the data set is really large (thousands of rows), recalculation time for the worksheet will be impacted due to the looping nature of the code.  In our application it was not deemed a show-stopper, but for larger workbooks, or when the custom function is being used in thousands of cells, we would probably seek out a leaner approach (or bite the bullet and set up a pivot table!)

Tuesday, February 9, 2010

Wrangle Those Names

Excel names are a fine way to store and reference data in a workbook.  Naming a cell allows you to include a readable reference to that cell in your formulas and macros.   Just imagine opening a spreadsheet from two years ago and seeing the formula "=$AG$377*(1+Sheet3!$BB$2110)"... not too meaningful until you navigate to the cells in question and see what's in there.  If, instead, you saw "=Balance*(1+Inflation)", you would instantly know what the formula does.

Names can do much more than simply point to a cell.  You can have a name refer to a RANGE of cells, and then act upon each cell in the range, as in: "=SUM(March_Sales)."  Names can also be dynamically-defined - something I talked about in an earlier post.  Dynamic names aren't simply linked to a particular cell, but are evaluated when the workbook is calculated.  For example, a dynamic name might refer to a member of a list, who's position is determined by some other cell's value.

Names can (and sometimes do) get ugly - especially when they refer to cells that have been deleted.  In this case, you end up with a name that evaluates to an Error reference instead of a value.  Tracking down these errors can be very difficult, because Excel's name management controls are, unfortunately, a bit rudimentary.  You need to go to the Insert menu, choose Name, and then Define.  Then click on each name in the list to verify what it's reference is.  On massive models that have evolved over time, with thousand of defined names, it can be a multi-day task to clean these up.

Another major maintenance headache is tracking down cell formulas that refer to cells in other workbooks.  Sometimes, when you copy from one workbook to another, you don't get the data you want but rather some indirect reference to it's original source.  This is occasionally desirable, but in my experience it is usually a hassle that confounds users, causes unwanted popup dialogs, and burns programming time.

To better deal with these issues, I created the Name Wrangler.

Name wrangler is a simple macro that lists and cleans up names in my workbooks, and identifies all external references in formulas.  The code resides in my Personal macro workbook, and I can invoke it on the active workbook when it's time to clean it up.

The Name Wrangler inserts a blank worksheet in your workbook and generates two tables: all the defined names found, and all formulas that refer to external files.  Names that evaluate to errors are then flagged and deleted.  I use this all the time, and it has saved me a lot of wasted hours scanning through workbooks, tracking down annoying errors and references I don't want.  The first thing I do when a client sends me a large model is run the Name Wrangler to get a handle on what I'm dealing with.

I have packaged the Name Wrangler in a workbook, which you may download here.  Feel free to use this code any way you like.  I welcome your suggestions and comments.