Saturday, December 26, 2009

Stories from the Jungle

Some years ago I worked on a project where a huge corporation (my employer) hired another huge corporation to develop a custom desktop application.  There would be about 5,000 users, and the app was built to handle contact, workflow and scheduling information in a client/server environment - typical early 90's stuff.  I was the junior guy, prepping data and helping my boss understand the new "PC" lingo. A couple of hot-shot programmers sat in a room for a month and wrote the application - in Visual Basic 2.0!  I suppose there were a few months of specification and gui design work ahead of this, but I wasn't involved in that work, having recently joined the company.   Anyway, the application was a failure.  We spent several million dollars, but the user community didn't like the app and refused to adopt it.

I believe the reason why is because the users would have had to change their business processes to make the software useful to them - they had to adapt to the app.  Since these were senior people - high achievers and highly trained - adapting to anything just wasn't going to happen.  Did the software fail to meet their needs?  Yes, it did.  But what the designers failed to consider was that it was doomed from the start.  Every user, having worked their way up in the business, had a distinct working style.  The software, in trying to satisfy everyone, satisfied no one.   The huge corporation went on to attempt this project at least two more times that I know of, but the adoption rate never broke 40%.

One funny thing that happened involved a routine that loaded names and addresses from a main-frame data feed.  The names and addresses were all in capital letters, which looked like crap on the application's output.  My boss was asking the consultants to quote the work of modifying the application to convert the ALL CAPS into Proper Case.  The programmers hummed and hawed a bit and then said this particular item would add 3 days to the bill.  These guys were costing $1,500 per day, so that's nine grand!  My boss leaned out his office door and asked me if I thought this was a reasonable estimate.  in the 30 seconds it took him to described the problem to me, I had fired up Excel and written a macro, using Excel's Proper function, to do the job.  "3 days?  Well, actually... just a second... it's done - I just did it with an Excel macro!"  

The consultants gave me a stinky look, but my boss smiled, and told them to skip that item.  I think I got a decent bonus that year.

Wednesday, December 16, 2009

Office 07 - Upgrade or Not?

Many Excel developers assert that Office 2003 (SP 3) is the high-water mark of the Microsoft Office product, mainly due to resistance against the radical interface changes that arrived with Office 2007.  The infamous Ribbon, which replaces a host of "standard" toolbars and menus which users have been interacting with for decades, has been criticized for being a giant leap backward in usability, among other things.  But more than the interface has changed with Excel '07.  The question is: are the changes worth upgrading for?

Besides the user interface, another major difference is the size of the worksheet - up to 1,000,000 rows and 16,300 columns may now be used, up from around 65,500 and 256 respectively in 03. This is a significant improvement if data storage is any measure.

Under the hood, not all that much has changed.  There are a few new worksheet functions which can simplify data summary calculations, and a few extensions to the Office object model, but for the majority of business applications, there is very little difference in functionality between Excel '03 and '07.

From a programmer's perspective, weighing the Pros and Cons leads me to the conclusion that upgrading to '07 is a BAD IDEA.  Here's why:

1. Unless you really need the space, insanely large worksheets are a huge liability.  Try copying a row or column in '03.  The same task in '07 might need to move up to 50 times more cells - using 50 times more system resources to do so.  Accidently paste something into a whole column?  Go get a coffee while your screen repaints, it'll take a while before the hourglass goes away and you can click that Undo button.

2. Macro performance is way down.  One model I created in '03 takes 10.8 seconds to execute the main refresh process.  This same process, in the same file, on the same computer, on the same day, in Excel '07 takes over 23 seconds!  Weren't upgrades supposed to buy you speed improvements?

3. The new SUMIFS and COUNTIFS functions are nice, but the same functionality can be achieved using array formulas in '03, so even if you desperately need these functions, you don't actually need '07.

4. The new file formats used by 07 are not fully backwards compatible with '03, so if your customers or anyone you need to share files with hasn't upgraded, there will be blood.   If you haven't upgraded yet but they have, no problem. '03 Macros run just fine in '07.  Yes, I know that Open XML is a good idea, I'm just sayin...

5. Excel '07 takes more than twice as long to load as '03.  Use it every day and those seconds start adding up to some real lost productivity, or at best, mild aggrivation.  The new version also uses more system resources.  My CPU cooling fan seems to rev at a higher (louder) speed whenever I have '07 loaded.

6. Office '07 costs money.  If the boss is paying, that's one thing, but for those of us who work independantly, the prospect of spending several hundred dollars for no good reason is a significant downer.

There are many other, minor reasons why I'm sticking with Excel '03 for development.  (I do run both on my system, which you can do without trouble in Windows XP.  I'm not sure about vista though.)  Microsoft seems to think that we should all just upgrade because it's there.  But if upgrading is a step backward, what's the sense of that?  I'll be sticking with Excel '03 until the reasons to upgrade are more compelling.

Sunday, December 13, 2009

Controlling Page Breaks Automagically

When it comes time to generate printed output from your Excel application, one sometimes finds that the standard toolset falls a bit short, and additional work must be done to get the pages looking just right. This is especially true in situations where the shape and size of the output is not predictable. For example: when the report is the output of a query or process, rather than a standard page template, it may not be possible to know how many rows, columns or pages may be involved.

Excel does give you access to many powerful printing control features, such as forcing the output to fit within a set number of printed pages (either width or height.) Most of these settings can be applied at design time, but one particularly tricky thing to manage is automatic page breaking.

An example Excel model contained a set of financials: an Income Statement, Balance Sheet, and Cashflow analysis. Each statement is set up for several hundred potential line items, many of which will be blank or zero, according to the data being summarized. To avoid a huge, ugly report that wastes reams of paper and toner every time it is printed, logic was added to hide blank rows, resulting in a nice, compact set of reports. Unfortunately, there's no way to know ahead of time how many pages will be needed, and Excel will insert automatic page breaks with no regard for logical placement.

When you're working on a static worksheet, it's relatively simple to manually adjust these page breaks so they happen in logical places, but in an automated environment, this is not an option. So, the following routine was developed to ensure page breaks only fall where allowed.


Public Sub Paginate(ByRef ws As Worksheet)

 Dim intPages As Integer
 Dim blnBadBreak As Boolean
 Dim intCol As Integer, intRow As Long

'must be in pagebreakpreview for all page
'breaks to be visible to the code
 ws.Activate
 ActiveWindow.View = xlPageBreakPreview

 intCol = 6 'control column

'clear all manual pagebreaks
 ws.ResetAllPageBreaks

'forced hard breaks (those with xx in control col)
 For intRow = 1 To ws.UsedRange.Rows.Count
   If ws.Cells(intRow, intCol) = "xx" Then
     ws.HPageBreaks.Add Before:=ws.Cells(intRow, 1)
   End If
 Next

'move arbritrary breaks (chosen by Excel) up to
'next viable row.

TopOfLoop:

 intPages = ws.HPageBreaks.Count
 If intPages <= 1 Then GoTo Done 

 For idx = 1 To intPages    
    intRow = ws.HPageBreaks(idx).Location.Row    
    blnBadBreak = False    
    While Cells(intRow, intCol) = ""      
       blnBadBreak = True      
       intRow = intRow - 1
       If intRow = 1 Then Goto Done    
    Wend    
    If blnBadBreak Then      
       ws.HPageBreaks.Add Before:=ws.Cells(intRow, 1)      
       GoTo TopOfLoop    
    End If 
 Next 


Done: 
'restore normal view 
 ActiveWindow.View = xlNormalView 


End Sub


This code requires a control column on the worksheet - this can be a hidden column outside the print area.  Use this column to indicate where page breaks are acceptable (with a single "x" - between sub-sections or at spacing rows) or where they are absolutely required (with "xx" - the top of each report section, for example.)  When the routine runs, it will first force page breaks where the xx's have been placed.  Then it will scan the automatic page breaks that Excel has "suggested", and if they don't fall where allowed, the code will move them up to the next allowable position.  The main program loop must be iterated multiple times, because every time you set a manual page break, this will cause Excel to recalculate the positions of the automatic breaks below that.

Just call this routine before printing the sheet, and you will be sure that the page breaks won't split a graph in half or bisect a section that needs to be continuous.  Your reports will look superb every time, no matter what the data set is.

Why Excel?

When asked why I develop in Excel, the answer usually contains one or more of the following points:

1. It's the most powerful platform available
The spreadsheet as a design paradigm is incredibly powerful, which is why Excel has become THE standard business application - used throughout the world for the past 18 years with no signs of ever fading away. The cells, rows and columns provide a great scaffold for your data, charts and text. The spreadsheet provides an elegant data entry, data storage, calculation and transformation infrastructure, and contains everything you need to support just about any complex business functionality imaginable.

2. It's high-level
One of Excel's greatest features as a development environment is all the built-in functionality you get. The Excel object model is comprehensive and well-documented. Integration with other Office applications is seamless (well, since 2000 it's been pretty stable.) Want to generate a report? Just lay out the worksheet the way you want up front, drop the data in, and you're done. 10 lines of code in VBA, using Excel's standard interface, can take the place of several hundred lines of standard Visual Basic code (or several thousand lines of C++.)

3. Everybody's using it
There aren't very many Windows PC's in the business community that don't have Office installed. Kids are learning to use Excel in schools, and most people use it in some aspect of their work, from data entry clerks to CFOs. The XLS file format is viewable on every operating system that supports e-mail attachments, including most cell phones. In short, Excel is, without a doubt, the most popular application in the business world. It was Lotus 1-2-3 on the IBM PC which enabled the Business PC revolution, and Microsoft created Excel in Lotus' image.

4. It's a known entity
Along the lines of number 3 above, Excel is already on just about every business desktop, so you don't need to install anything on the corporate network when you deliver a solution to a company. It's easy to send an XLS file attachment to a customer by email, but asking them to download and install an executible file is another matter - probably won't be allowed at all in a major corporate environment like a bank.

5. It's well supported in the community
For developers, you can be reasonably sure someone else has already done what you're trying to do, and has worked out the tricky parts for you. Bugs in Excel are well known and workarounds abound. Forums, support groups and blogs such as this one provide an incredible wealth of knowledge for beginning and seasoned developers alike.

There are many other good reasons to develop in Excel - what are yours?

Thursday, December 3, 2009

Insulate for Winter

When your VBA method (macro) needs to get data from a section of the workbook, there are many ways to achieve this, and most of them will perform quite well. But what happens when somebody tweaks the spreadsheet? When a user adds sub-totals or a new column to the worksheet, will this require a macro fix?

Developers who would rather avoid this type of distraction, (preferring instead to invent new and interesting applications,) attempt to Insulate their Code from the Interface.

In the case of Excel applications, the interface refers to everything that Excel is - the columns and rows which users tend to move around and delete, the Excel command set: filters, sorts, etc.

One strategy is to declare a Range variable and assign it to an known position - the data table's first column heading is a good one. (Use a named range, which will follow the actual cell around.) Your table scan happens relative to this anchor point; made easy using the Offset method:



dim rng as Range
dim tbl as Range
dim intRow as Integer
dim AmtCol as Integer

Set rng = Range("data_corner")
Set tbl = rng.CurrentRegion
AmountCol = Range("AmountCol").Column - rng.Column

For intRow = 1 to 20
   If rng.Offset(intRow,AmountCol) > 99.99 Then
     ... something happens!



To make this really bullet-proof, don't hard-code any assumptions about the table layout. Gold-level protection requires naming a Range on the worksheet for each column heading you need to work with (the AmountCol in the above example). Now the users can move the individual columns, insert new ones, or relocate the whole table, and your code continues to work.

Named ranges not feasible? Then you should have a routine to search for the actual column headings and cache those positions in variables before working on the table. This is reasonably reliable, until the headings get changed.

The quick and dirty method is to assume the positions of the columns will never change, set up constants for those so the code is at least readable, and hope for the best.

DNF to the fool who hard-codes an actual cell address in their VBA code!