Saturday, December 26, 2009

Stories from the Jungle

Some years ago I worked on a project where a huge corporation (my employer) hired another huge corporation to develop a custom desktop application.  There would be about 5,000 users, and the app was built to handle contact, workflow and scheduling information in a client/server environment - typical early 90's stuff.  I was the junior guy, prepping data and helping my boss understand the new "PC" lingo. A couple of hot-shot programmers sat in a room for a month and wrote the application - in Visual Basic 2.0!  I suppose there were a few months of specification and gui design work ahead of this, but I wasn't involved in that work, having recently joined the company.   Anyway, the application was a failure.  We spent several million dollars, but the user community didn't like the app and refused to adopt it.

I believe the reason why is because the users would have had to change their business processes to make the software useful to them - they had to adapt to the app.  Since these were senior people - high achievers and highly trained - adapting to anything just wasn't going to happen.  Did the software fail to meet their needs?  Yes, it did.  But what the designers failed to consider was that it was doomed from the start.  Every user, having worked their way up in the business, had a distinct working style.  The software, in trying to satisfy everyone, satisfied no one.   The huge corporation went on to attempt this project at least two more times that I know of, but the adoption rate never broke 40%.

One funny thing that happened involved a routine that loaded names and addresses from a main-frame data feed.  The names and addresses were all in capital letters, which looked like crap on the application's output.  My boss was asking the consultants to quote the work of modifying the application to convert the ALL CAPS into Proper Case.  The programmers hummed and hawed a bit and then said this particular item would add 3 days to the bill.  These guys were costing $1,500 per day, so that's nine grand!  My boss leaned out his office door and asked me if I thought this was a reasonable estimate.  in the 30 seconds it took him to described the problem to me, I had fired up Excel and written a macro, using Excel's Proper function, to do the job.  "3 days?  Well, actually... just a second... it's done - I just did it with an Excel macro!"  

The consultants gave me a stinky look, but my boss smiled, and told them to skip that item.  I think I got a decent bonus that year.

Wednesday, December 16, 2009

Office 07 - Upgrade or Not?

Many Excel developers assert that Office 2003 (SP 3) is the high-water mark of the Microsoft Office product, mainly due to resistance against the radical interface changes that arrived with Office 2007.  The infamous Ribbon, which replaces a host of "standard" toolbars and menus which users have been interacting with for decades, has been criticized for being a giant leap backward in usability, among other things.  But more than the interface has changed with Excel '07.  The question is: are the changes worth upgrading for?

Besides the user interface, another major difference is the size of the worksheet - up to 1,000,000 rows and 16,300 columns may now be used, up from around 65,500 and 256 respectively in 03. This is a significant improvement if data storage is any measure.

Under the hood, not all that much has changed.  There are a few new worksheet functions which can simplify data summary calculations, and a few extensions to the Office object model, but for the majority of business applications, there is very little difference in functionality between Excel '03 and '07.

From a programmer's perspective, weighing the Pros and Cons leads me to the conclusion that upgrading to '07 is a BAD IDEA.  Here's why:

1. Unless you really need the space, insanely large worksheets are a huge liability.  Try copying a row or column in '03.  The same task in '07 might need to move up to 50 times more cells - using 50 times more system resources to do so.  Accidently paste something into a whole column?  Go get a coffee while your screen repaints, it'll take a while before the hourglass goes away and you can click that Undo button.

2. Macro performance is way down.  One model I created in '03 takes 10.8 seconds to execute the main refresh process.  This same process, in the same file, on the same computer, on the same day, in Excel '07 takes over 23 seconds!  Weren't upgrades supposed to buy you speed improvements?

3. The new SUMIFS and COUNTIFS functions are nice, but the same functionality can be achieved using array formulas in '03, so even if you desperately need these functions, you don't actually need '07.

4. The new file formats used by 07 are not fully backwards compatible with '03, so if your customers or anyone you need to share files with hasn't upgraded, there will be blood.   If you haven't upgraded yet but they have, no problem. '03 Macros run just fine in '07.  Yes, I know that Open XML is a good idea, I'm just sayin...

5. Excel '07 takes more than twice as long to load as '03.  Use it every day and those seconds start adding up to some real lost productivity, or at best, mild aggrivation.  The new version also uses more system resources.  My CPU cooling fan seems to rev at a higher (louder) speed whenever I have '07 loaded.

6. Office '07 costs money.  If the boss is paying, that's one thing, but for those of us who work independantly, the prospect of spending several hundred dollars for no good reason is a significant downer.

There are many other, minor reasons why I'm sticking with Excel '03 for development.  (I do run both on my system, which you can do without trouble in Windows XP.  I'm not sure about vista though.)  Microsoft seems to think that we should all just upgrade because it's there.  But if upgrading is a step backward, what's the sense of that?  I'll be sticking with Excel '03 until the reasons to upgrade are more compelling.

Sunday, December 13, 2009

Controlling Page Breaks Automagically

When it comes time to generate printed output from your Excel application, one sometimes finds that the standard toolset falls a bit short, and additional work must be done to get the pages looking just right. This is especially true in situations where the shape and size of the output is not predictable. For example: when the report is the output of a query or process, rather than a standard page template, it may not be possible to know how many rows, columns or pages may be involved.

Excel does give you access to many powerful printing control features, such as forcing the output to fit within a set number of printed pages (either width or height.) Most of these settings can be applied at design time, but one particularly tricky thing to manage is automatic page breaking.

An example Excel model contained a set of financials: an Income Statement, Balance Sheet, and Cashflow analysis. Each statement is set up for several hundred potential line items, many of which will be blank or zero, according to the data being summarized. To avoid a huge, ugly report that wastes reams of paper and toner every time it is printed, logic was added to hide blank rows, resulting in a nice, compact set of reports. Unfortunately, there's no way to know ahead of time how many pages will be needed, and Excel will insert automatic page breaks with no regard for logical placement.

When you're working on a static worksheet, it's relatively simple to manually adjust these page breaks so they happen in logical places, but in an automated environment, this is not an option. So, the following routine was developed to ensure page breaks only fall where allowed.


Public Sub Paginate(ByRef ws As Worksheet)

 Dim intPages As Integer
 Dim blnBadBreak As Boolean
 Dim intCol As Integer, intRow As Long

'must be in pagebreakpreview for all page
'breaks to be visible to the code
 ws.Activate
 ActiveWindow.View = xlPageBreakPreview

 intCol = 6 'control column

'clear all manual pagebreaks
 ws.ResetAllPageBreaks

'forced hard breaks (those with xx in control col)
 For intRow = 1 To ws.UsedRange.Rows.Count
   If ws.Cells(intRow, intCol) = "xx" Then
     ws.HPageBreaks.Add Before:=ws.Cells(intRow, 1)
   End If
 Next

'move arbritrary breaks (chosen by Excel) up to
'next viable row.

TopOfLoop:

 intPages = ws.HPageBreaks.Count
 If intPages <= 1 Then GoTo Done 

 For idx = 1 To intPages    
    intRow = ws.HPageBreaks(idx).Location.Row    
    blnBadBreak = False    
    While Cells(intRow, intCol) = ""      
       blnBadBreak = True      
       intRow = intRow - 1
       If intRow = 1 Then Goto Done    
    Wend    
    If blnBadBreak Then      
       ws.HPageBreaks.Add Before:=ws.Cells(intRow, 1)      
       GoTo TopOfLoop    
    End If 
 Next 


Done: 
'restore normal view 
 ActiveWindow.View = xlNormalView 


End Sub


This code requires a control column on the worksheet - this can be a hidden column outside the print area.  Use this column to indicate where page breaks are acceptable (with a single "x" - between sub-sections or at spacing rows) or where they are absolutely required (with "xx" - the top of each report section, for example.)  When the routine runs, it will first force page breaks where the xx's have been placed.  Then it will scan the automatic page breaks that Excel has "suggested", and if they don't fall where allowed, the code will move them up to the next allowable position.  The main program loop must be iterated multiple times, because every time you set a manual page break, this will cause Excel to recalculate the positions of the automatic breaks below that.

Just call this routine before printing the sheet, and you will be sure that the page breaks won't split a graph in half or bisect a section that needs to be continuous.  Your reports will look superb every time, no matter what the data set is.

Why Excel?

When asked why I develop in Excel, the answer usually contains one or more of the following points:

1. It's the most powerful platform available
The spreadsheet as a design paradigm is incredibly powerful, which is why Excel has become THE standard business application - used throughout the world for the past 18 years with no signs of ever fading away. The cells, rows and columns provide a great scaffold for your data, charts and text. The spreadsheet provides an elegant data entry, data storage, calculation and transformation infrastructure, and contains everything you need to support just about any complex business functionality imaginable.

2. It's high-level
One of Excel's greatest features as a development environment is all the built-in functionality you get. The Excel object model is comprehensive and well-documented. Integration with other Office applications is seamless (well, since 2000 it's been pretty stable.) Want to generate a report? Just lay out the worksheet the way you want up front, drop the data in, and you're done. 10 lines of code in VBA, using Excel's standard interface, can take the place of several hundred lines of standard Visual Basic code (or several thousand lines of C++.)

3. Everybody's using it
There aren't very many Windows PC's in the business community that don't have Office installed. Kids are learning to use Excel in schools, and most people use it in some aspect of their work, from data entry clerks to CFOs. The XLS file format is viewable on every operating system that supports e-mail attachments, including most cell phones. In short, Excel is, without a doubt, the most popular application in the business world. It was Lotus 1-2-3 on the IBM PC which enabled the Business PC revolution, and Microsoft created Excel in Lotus' image.

4. It's a known entity
Along the lines of number 3 above, Excel is already on just about every business desktop, so you don't need to install anything on the corporate network when you deliver a solution to a company. It's easy to send an XLS file attachment to a customer by email, but asking them to download and install an executible file is another matter - probably won't be allowed at all in a major corporate environment like a bank.

5. It's well supported in the community
For developers, you can be reasonably sure someone else has already done what you're trying to do, and has worked out the tricky parts for you. Bugs in Excel are well known and workarounds abound. Forums, support groups and blogs such as this one provide an incredible wealth of knowledge for beginning and seasoned developers alike.

There are many other good reasons to develop in Excel - what are yours?

Thursday, December 3, 2009

Insulate for Winter

When your VBA method (macro) needs to get data from a section of the workbook, there are many ways to achieve this, and most of them will perform quite well. But what happens when somebody tweaks the spreadsheet? When a user adds sub-totals or a new column to the worksheet, will this require a macro fix?

Developers who would rather avoid this type of distraction, (preferring instead to invent new and interesting applications,) attempt to Insulate their Code from the Interface.

In the case of Excel applications, the interface refers to everything that Excel is - the columns and rows which users tend to move around and delete, the Excel command set: filters, sorts, etc.

One strategy is to declare a Range variable and assign it to an known position - the data table's first column heading is a good one. (Use a named range, which will follow the actual cell around.) Your table scan happens relative to this anchor point; made easy using the Offset method:



dim rng as Range
dim tbl as Range
dim intRow as Integer
dim AmtCol as Integer

Set rng = Range("data_corner")
Set tbl = rng.CurrentRegion
AmountCol = Range("AmountCol").Column - rng.Column

For intRow = 1 to 20
   If rng.Offset(intRow,AmountCol) > 99.99 Then
     ... something happens!



To make this really bullet-proof, don't hard-code any assumptions about the table layout. Gold-level protection requires naming a Range on the worksheet for each column heading you need to work with (the AmountCol in the above example). Now the users can move the individual columns, insert new ones, or relocate the whole table, and your code continues to work.

Named ranges not feasible? Then you should have a routine to search for the actual column headings and cache those positions in variables before working on the table. This is reasonably reliable, until the headings get changed.

The quick and dirty method is to assume the positions of the columns will never change, set up constants for those so the code is at least readable, and hope for the best.

DNF to the fool who hard-codes an actual cell address in their VBA code!


Monday, November 30, 2009

Winning Strategies for Success

This post is written mostly for Excel developers who work as consultants in the Freelance realm. When you engage a new client, there are a few things you can do right to ensure success, which comes in the form of repeat business. Here are my recommendations:

1. Contain the Scope
The client has a specific business problem, and needs your help solving it. Obtain crystal clear agreement on what the problem is and what the solution must contain - in writing if possible. Don't throw on a bunch of extras because you think they are needed or interesting - that's another project.

Scope Creep occurs when the client (or you) ask for extras and you include them in a project that is already underway - suddenly the finish line you all thought was there has moved. Out the window goes your estimate and the client's acceptance of it. Save up any add-ons for a phase 2 deliverable.

2. Provide Good Value
The client is taking a considerable risk hiring you, both in money and time. Aim to break-even on the first job if you think there's a chance of follow-on work. Come in too high and you either won't get the job, or if you do, the client won't feel like they got their money's worth. The client's capacity to pay is a factor here: Citigroup won't blink at that $100/hour rate, but Joe's Accounting probably can't swallow it. Once you have established your value and gained the client's respect and trust, you can safely pad your quote with a reasonable margin - we all have to make a living!

3. Support Your Work
Is the solution correct? On your machine, it is, but who knows what will happen on the Client's network. Don't deliver your code to the client the day before you go on vacation - until they sign-off, consider it untested. Answer questions and address issues the instant they are reported to you. No matter how busy you are, code that has shipped but not been paid for yet is your biggest priority. Give unquestioning support at this stage, and you cement yourself as a committed partner to the client's business.

Watch out! What the client calls a bug may be something they forgot to ask for up front... tread lightly as you educate them on what is "in scope" and what is a phase 2 change request.

4. Design for Scallability
Why write an import routine when, with a small amount of extra effort, you can write an import engine? The data requirements or inputs of any project will change over time. If you need to dig into your old code and update a bunch of cell references every time this happens, your solution becomes a liability to both you and the client.

All layout information used by your code should be read from a setup table. When the particulars change, you simply update the setup values and you're done - or get the client to do it himself. You can focus on real work and the client doesn't get nickled-and-dimed on the maintenance. Even better, the likelyhood of your code being reusable goes way up, and reusing code is all margin baby!

Friday, November 27, 2009

Finding Fault

Using the FIND method in your VBA macro is a good way to locate a specific piece of data in your workbook, and can be much more efficient than looping through an array of cells. For example, say you have a massive table of data, but each record is identified with a unique key of some kind. The fastest way to get a reference to a specific record is with FIND:

Set rngSearch = sht.Range("B1:B5000")
Set rngFind = Nothing
Set rngFind = rngSearch.Find(strKey)
If Not rngFind Is Nothing Then
   Result = rngFind.Row
End If

Unfortunately, this method can let you down if you're not careful, due to it's association with the Excel FIND menu command. The problem is that the function contains a bunch of optional parameters for which the default value is not predictable. As the VBA help file states:
The settings for LookIn, LookAt, SearchOrder, and MatchByte are saved each time you use this method. If you donĂ­t specify values for these arguments the next time you call the method, the saved values are used. Setting these arguments changes the settings in the Find dialog box, and changing the settings in the Find dialog box changes the saved values that are used if you omit the arguments. To avoid problems, set these arguments explicitly each time you use this method.
The above code might work perfectly for years, but then the user of the spreadsheet, on some totally unrelated project, uses the Find dialog to look for a number format or something. Now the Macro stops working - key values aren't being found. Depending on the data and what you're searching for, you might not even notice the problem, and simply think that the data isn't there... Maddening!

So, to avoid surprises, be sure to explicitly define ALL of the method's parameters EVERY TIME. The above code, rephrased, should look like this:

Set rngSearch = sht.Range("B1:B5000")
Set rngFind = Nothing
Set rngFind = rngSearch.Find(What:=strKey, _
LookIn:=xlFormulas, _
LookAt:=xlWhole, _
SearchOrder:=xlByRows, _
SearchDirection:=xlNext, _
MatchCase:=False, _
SearchFormat:=False)
If Not rngFind Is Nothing Then
  Result = rngFind.Row
End If

Monday, November 23, 2009

Keep 'em Separated

If you develop Excel macros that will be used by others, then you can be pretty sure those users will ask for fixes or additional features at some point in the future. If your users have done anything to modify the Excel file you give them, then you run the risk of clobbering their work when you send the next update.

Consider this example: A Purchase Order system developed for a retail store uses some sophisticated lookup formulas - pick Comnpany A as the Supplier, and only products from Company A show up as choices. It's a great time-saver for the end users, and helps reduce data errors, but the formulas rely on lookup tables imbedded in the workbook. These lookup tables need to be updated occasionally as suppliers change their lines.

If there's a bug / feature update, the programmer must either: A) get the latest copy of the user's workbook before making the change, during which the user can't do any edits, or B) give the user the updated workbook and ask them to reproduce all their edits since the last update, or C) Sync the data manually (yourself - egad!). All situations are problematic. The accepted solution is to keep the data (that's anything the user controls) and the macro code (your stuff) completely separate.

There are many ways to achieve this: you could get the workbook to connect to a corporate SQL database, link the workbook to an external file, grab the data from a Web Service, or provide an import routine of some kind. There are pros and cons for each option, but in situations where an enterprise SQL Server is not available, an external data file is usually employed.

Excel provides a built-in mechanism to actively link workbooks together. This usually works... but to minimize complexity (a guiding principal) I try to avoid live file links - I've seen too many broken or corrupt links over the years, and nobody likes those popup messages.



I generally opt for passive linking. This is where a macro is used to import the external lookup data when the work process is initiated.  The needed data is copied to the local working file (your macro workbook.) All calculations and macros work on the local copy of the data, which avoids external formula references and file links, and keeps the application file portable.

Usablity Tip! You should cache the path to the data, but give the users an easy way to find the file if it moves using Excel's standard file dialog:

Dim sFile As String
...
sFile = Application.GetOpenFilename("Excel Files (*.xls),*.xls", _
        1, "Select the Data File", , False)
If sFile = "False" Then Exit Sub 'user clicked cancel
Range("Data_File") = sFile
...

Data, by it's nature, changes all the time - it could be the output from some other system, or part of a growing transaction log. Wherever it resides, that's no place for your VBA code or complex, inter-locking formulas.

Passive linking will allow you, the programmer, to maintain ownership of the code while allowing the users to control the data. Improvements to the application are painless for both parties, allowing it to stay in use (and useful) for a long time.

Friday, November 20, 2009

Dynamic Names

One of Excel's most powerful features is the ability to define Names. A Name can refer to a single cell, a range, or a calculated value. You can even define a dynamic Name that refers to a changing range of cells.

You might need to refer to a list of items in your code, but you don't know how big the list will get or where someone might move it. To automatically sync the named range to the data, define a dynamic Name using the Insert -> Name -> Define menu, and this dialog:

The Refers To: field contains a couple of formulae instead of a cell reference, rephrased:
=OFFSET($B$2,1,0,COUNTA($B$2:$B$1000)-1,1)

The Offset formula defines a corner cell, how many rows away, columns away, how many rows to include, and columns to include.  

For the corner cell use the table header, not the first data item, because somebody might delete that cell. The CountA function in the How Many Rows parameter counts non-blank cells in the column.  The heading cell is included in the count, so subtract 1 from it's result to give the actual number of rows in the range.  This example sets an arbitrary limit of 998 items - your knowledge of the likely dataset and any use of the worksheet below the data table are factors in specifying this value.

Blank rows within the data would be a problem, so if that's a possibility, you can nest a few more functions to detect the range of cells you want... but it does get messy and there's no function help when editing names in the dialog, so be warned.

Anyway, now you can use that Named Range in code...

Dim rng as range
For Each rng in Range("Customer_List")
... some code
Next
And you're not putting any actual cell addresses in there so the code stays clean and safe.

An important factor when defining a Name this way is that your Reference formula needs to be safeguarded against possible edits to the worksheet. If a user deletes a cell which is part of the Name's RefersTo property, that Name could break, resolving to a #REF error for ever more. Good Excel programmers anticipate this possibility and set up Names (and Formula references) for maximum durability, helping workbooks live longer.

Wednesday, November 18, 2009

Object Non-Existence Check - Trap

When you're writing code in VBA, you should stiffen up and prepare for trouble when checking for the NON-existence of an object. Why? Because this might actually trigger an error you weren't expecting or handling.


Consider this code which operates on Excel's menus:

On Error Resume Next
If NOT MenuBars(xlWorksheet).Menus("My App") Is Nothing Then...
  ...some code runs
End If


So what happens when this code executes and the menu we're looking for doesn't exist? Simply trying to reference a non-existant object triggers an error, and the code within the IF block will still run, believe it or not. The "On Error Resume Next" line makes that happen - any error moves execution to the next line, regardless of weather the IF condition was met. A proper error handler would avoid this problem.


Sometimes we can use this situation to our advantage. Lets say we want a set of unique values from a list. I like to use VBA's Collection mechanism with it's built-in Unique Key constraint to make this easy. You'll need an object Class to act as the data container:

'Class Module called "ItemClass" with a single Property
Public Key As String


Then create a collection and try adding everything. Only the ones that are new will be allowed in.

Dim colSet As New Collection
Dim clsItem As ItemClass
Dim rngCell As Range

On Error Resume Next
For Each rngCell In Range("A1:A999")

   Set clsItem = New ItemClass
   clsItem.Key = rngCell.Text

   'this line will simply fail if the key is already in use
   colSet.Add clsItem, clsItem.Key
Next



In this case we are deliberately ignoring the error that is triggered when we try to re-use an item's value as the collection key.

To Sum Up: watch out when your IF condition triggers an error, because all bets are off. You've got to know when to handle 'em, and know when to let 'em Resume Next.

Tuesday, November 10, 2009

Welcome

Welcome to the blog. This is the beginning. Here I will document some of the techniques I have discovered while programming advanced macros in Microsoft Excel. I hope to add at least one new posting each week. These aren't going to be tips or tricks; I'm talking about design strategies, data manipulation algorithms, perhaps hacks... okay, probably some tips, but low-level coding tips, not keyboard short-cuts, okay? And I'll even provide some example files so you can try this stuff without a whole lot of typing.