Tuesday, November 8, 2011

Maintenance-Free Solutions

As a software creator, I like to spend as much of my working time as possible inventing cool stuff and solving tricky problems using my creativity and skill.  But in order to get paid to do this as an independent consultant, one must also handle such tasks as Sales and Marketing (doing that now), Bookkeeping, Documentation, and the dreaded System Maintenance.

By maintenance, I'm not really talking about bug fixes.  Most of the time it's about changing business logic, connecting to other systems, and new user requirements.  

Some developers, and in my experience, most consulting "Firms", put a high value on the downstream Maintenance hours that come with any major software rollout.  To me, this Vendor-centric attitude sounds a bit like planned redundancy, or selling cheap printers so you can lock 'em in to a lifetime of overpriced ink.  I believe in craftsmanship, which means having a more client-centric approach.  And like I said, I'd rather be inventing new stuff than fixing old stuff.

I learned long ago that a bit of planning up front can minimize or even eliminate the maintenance burden.  Now, whenever I design a system for a client, I make sure to include some Maintenance-mitigation strategies in my design.

1. Keep it simple - this is obvious, but basically, the less code you have, the easier it will be to fix, change, extend, etc.  Do you really need a set of routines to populate, show and validate a dialog box, or could the user update the spreadsheet directly?  Bells and whistles are fun and exciting to build, but aren't always necessary.  A terse design will go farther between oil changes.

2. Build an engine - Reports, File Imports, Calculations... if you can build a data-driven structure to handle each of these, it becomes much easier to modify and extend your application.  Here's an example: my app needs to import data from a 3rd party file.  What if the data provider changes the file layout without warning?  If I hard-coded the import logic, it breaks.  But if I build an import engine that tracks the position of each import column, along with a simple screen where this configuration info can be updated, then the user (or I) can just update the config and re-run - job done.

3. Let the User Design it - This is the cornerstone of my software design strategy: I never want to hard-code the look, feel, or even workflow of a program.  If the app warrants it, I set up all the views as blank canvases which the users can modify to suit their needs.  This requires some planning:

  • You need to model all of your data entities as object classes
  • All lists and reports must use a data-driven layout - nothing hard-coded on worksheets
  • In a multi-user environment, Login accounts let individual users configure the app to suit their personal needs

This is a lot more work up front, but the goal is to avoid those calls asking you to move a field or rename a heading - now the users can do that themselves.  As you start to build a few solutions this way, you amass a library of ideas and code that you can re-use and improve from project to project.

The biggest challenge with this mission can be selling it.  Small budget projects might not warrant this level of future-proofing, and it will take time to educate the customer/boss on the potential benefits. You don't need to mention that the biggest benefit is not being interrupted by minor change requests when you're deep in the creative zone on the next cool project.

Don't give them a fish, make them anglers.

Thursday, September 8, 2011

Hard Lessons

In the last post I described the new hardware setup I recommended for a client, to improve a slow and memory-hungry Excel app.  The hardware is in place and the client is happy, but the transition didn't go smoothly.

I discovered that the hardware upgrade in itself was not enough to cure our ills.  After coming up against some disappointing results, despite the bump to Windows 7 and Excel 2010, I was obligated to re-write a big chunk of the data handling logic; something I was hoping to avoid.

My initial research led me to believe that 64 bit Excel has access to as much ram as you can slot into your machine, but this is not my experience.  The spreadsheet part of Excel might be bigger, faster, stronger and fresher smelling, but the VBA object space is not.

Upgrading my VBA code to work in Excel 2010 was actually a non-event.  I only found one line, (related to checking the amount of ram available, ironically) that choked.  Otherwise the macros and formulas all behaved exactly as designed.  Excel 2010 saves files in the Office 2007 file format, so that was my first clue that this isn't really that much of an Excel upgrade.

Then, after seeing Excel freeze, just like before, when my code tried to pre-load six million+ data points into the application's memory structures, I was beginning to think the whole ordeal and expense may have been for nothing.  Further reading on-line found some comments from a Microsoft engineer who wrote that they achieved significant performance gains on Excel 2010's built-in functions, but basically didn't do a thing to optimize or beef up the VBA compiler... that will be in the next release, supposedly.

Perhaps the recent push towards Cloud-based applications is drawing some of the programming talent away from the Excel group at Microsoft.  I'm a developer, so I can't really hold a grudge.  Excel is still a miracle of engineering in my opinion, and the best software ever to run on any computer anywhere. (Yes, I like it.)

Anyway, I had no choice but to re-work the data logic so the application fetches what it needs at report time, instead of pre-loading on start-up.  I made this change, and the app became usable again - even improved, from the client's perspective, because it "starts up a lot faster."  Reports now require additional processing, but the fast new chip and fast ram more than make up for it.  Also, thanks to the solid state hdd, (and possibly Widows7's superior design,) the memory swapping issue has totally gone away.

We didn't get the 800% performance boost that Moore's Law predicted, but our app's data ceiling is now a clear blue sky, and one busy brokerage is crunching numbers with it on a daily basis.  The client is armed with an unbelievably powerful stock analysis tool, and now I know what to expect (and what expectations to instil) the next time a client wants to level up.

Tuesday, July 12, 2011

Throw More Hardware At It

A very large data set can be taxing on Excel.  On older systems, Excel is limited to 1 GB of ram.  Office 2007 enables up to 2GB.  But opening a workbook that is hundreds of megabytes in size can be an exercise in patience.


A recent project is tracking statistics for 5,000 companies going back for several years.  Some numbers, like Earnings and Dividends are reported quarterly.  Stock Price, which ties into a host of downstream reporting and industry averaging logic, is tracked daily.


Rolling all these numbers together to figure out potential investment strategies requires juggling several million data values in memory.  The data is being housed in an Access database for now, with plans to bump up to SQL server.  But the data interchange is not the bottleneck, it is Excel's ram limit.


My client is running Windows XP and Excel 2003.  I have optimized and re-factored the object model to squeeze every last drop of performance from the platform, but we have hit the proverbial wall.  


As more and more data was captured and the record count approached 4.8 million, some of the long reporting macros began to get bogged down by Windows memory swapping - this is when the hard drive is used as the spillover for physical memory, and it is 100 times slower than normal Ram operations.  To make the application usable, we needed to either reduce the data universe (and thus, the value of the analysis), or implement a hardware and software upgrade.  We're taking the upgrade path.


The new machine is a Dell Latitude E5520, billed as a portable workstation - more pricy than a back-to-school netbook, but with the horsepower to churn through these advanced computations.  (This is, after all, the stuff computers were invented for.)  The laptop upgrade will run Windows 7 64 bit, and Office 2010 Pro, leaving nothing on the table.  Let's see what's in the package:
  • Intel Core i7-2620M, 2.70GHz - Four duo-cores, that's 8 processors!
  • 8.0GB, DDR3-1333MHz SDRAM - that's fast memory, and a lot of it.
  • 128GB Dell Mobility Solid State Drive - THIS is the money.
On the old hardware and platform, one rather large report that computes historical moving averages across industry sectors, can take up to 4 minutes to complete.  If Windows decides it's time to start swapping ram to the hard drive, macro time jumps to 15 minutes.  


Using a solid state hard drive, this won't be a concern as swaps to a ram disc are just about as fast as normal ram.  (and this kind of hard drive allows the computer to boot up in seconds, so I'm told.)  Exploiting the newest 64 bit Windows os and Office suite will allow Excel to access up to 1TB (that's terabyte) of memory.  


The Intel Core i7 is a linked array of 64 bit processors that makes this equivalent to a 90's era SuperComputer costing millions of dollars. Lets see if the performance gains pan out...

The hardware is on-order.  The client is waiting expectantly, as am I.  Next post, I will outline my steps in upgrading the Excel 03 vba code to run in Excel 2010 (if any) and benchmark the new system.  This is a $3,000 blind faith endeavour.  But as the client says, if he makes just one good decision because of the software, it's all paid for! 

Wednesday, April 20, 2011

Database integration using object classes

This month I am working on a quantitative analysis tool for a stock broker.  He wants to track stock data over time and rank stocks in a variety of ways.  Rather than try to figure out all the different reports and charts we will need to build, we agreed that the tool needs to be completely open-ended.  This means the customer can configure every aspect of the application, including the data fields being captured, the format of the reports, and the formulas used for ranking and classifying the data.

Capturing daily numbers for 5000+ companies requires a robust data storage and query component, so an Access database has been created.  This de-cupling of the data from the program logic is essential for an application of this scale.  Since an Excel project is never finished, the steady stream of updates and fixes can be deployed simply by emailing an updated Excel workbook.  The macro then connects to the Access file on the client's system, loads all the settings and information the client has been inputting and collecting, and configures itself.

Logic was created to handle custom field definitions and custom formulas.  A report designer was also built allowing the operator to layout new reports, control grouping and sorting of data, sub-totals, and formatting.  All this setup data is stored in the Access database.

When the user wants to view a report, the macro loads the setup details from the database, queries the company data tables for the required data, and draws up the report on a worksheet.  To handle all these inter-related data entities, we need to define a set of object classes in VBA.

I created classes for Field, Report, Company, Sector, Industry, Price, and all the other entities that our database is tracking.  At the project level I define Collections which are loaded with the individual instances of each class when the workbook is opened.  So when the user asks for a report, the code iterates through the Reports collection and builds the list of available reports to choose from.

The classes contain the logic they need to interact with the database and with each other.  For example, each class has Load, Insert, Update and Delete methods.  When the user defines a new custom Field, the code creates an instance of the Field object, updates it's properties with the settings the user has input, and calls it's Insert method to write the actual record to the database and insert itself into the Fields collection.

When the reports get generated, the Report object "knows" how to load a sub-collection of the fields that are on the report.  Each Field knows how to fetch the appropriate piece of data and format it for presentation.  If a Field is calculated based on other pre-defined fields, the logic will parse this out and use new Field objects to fetch the individual elements that make up the calculation.

By carefully modelling your objects this way, your code becomes incredibly powerful, because the "work" is distributed across the object model.  Your core routines can be quite lean and efficient, and much easier to read and maintain.  Instead of 20 different Reporting macros, you have one generic macro that produces all possible reports.  And since you're manipulating massive data structures completely in RAM, your code can run very quickly compared to the basic approach of stacking tables of data on different spreadsheets and iterating through cells row by row.

Learning how to model your data using object classes is not an easy skill to master.  I have many years of programming experience on a variety of platforms, including VB.Net, to call upon.  I have also made it a priority to master the design and programming relational databases using SQL.  

Yet gaining this knowledge was achieved by doing.  When you start experimenting with advanced techniques such as object classes and external databases, you build confidence, overcome challenges, and ultimately learn how to produce enterprise-level applications that are extendible, powerful, portable and maintainable.

For more information on these topics, please see some of my prior postings:

Wednesday, March 23, 2011

An Unlikely Speed Boost

I've been programming with Excel since version 1.0 and I've picked up a few trick along the way.  Some are passed along from other power programmers, and some I just discover, like this one:

I have a reporting system that is generating a 5000 line report on a worksheet.  It runs in two modes.  The first mode fetches the data from an Access database into memory, and then loops through the memory collections to draw the output.  The second mode simply draws the output, assuming the data is already in memory.  Much to my surprise, mode 2 was running a lot slower than mode 1, which is unexpected because mode 1 is fetching and looping through a recordset, hitting twice as many lines of code as mode 2.

To figure out what was going on, I added a progress counter to both parts of the code.  This allows me to watch the macro's percentage of completion as it chugs along.  For no obvious reason, the macro running in mode 1 finished in about 6 seconds, whereas running it in mode 2 took a whopping 32 seconds.  No obvious bottleneck or data differences could be discovered.

I then had a look at how the macros were being called.  Mode 1 was being fired by a change event on a drop-down list control.  Mode 2 is called using a macro button.  I thought that might be the difference, so I tried calling both modes from a temporary macro, and suddenly I'm seeing slow performance in mode 1!  I tried a few variations, but found that it wasn't how the macro is being called, but a subtle difference in the code between the two routines.

My drop-down control's event code had the line:    Application.Cursor = xlWait at the top, followed by the call to the report macro, and finally:    Application.Cursor = xlDefault.  I put this in originally because I thought the extra step of fetching data might cause a delay in the macro, and showing an hourglass reassures my users that something is happening.

When I added these lines to the button click event for mode 2 of the macro, I observed the dramatic speed improvement I was looking for.  I'm already suppressing calculation and screen updating, which makes a big difference on any code that draws to a spreadsheet, but I had no idea that changing the mouse cursor to an hourglass could give a further 400% speed boost!  I will now go back and use this technique on all the models I've created that have lengthy macro processes.

Let me know if you have any other obscure tricks like this - I'm always on the lookout for more macro speed, and I know Excel has many more undiscovered secrets in store for us all.

Thursday, January 27, 2011

Check your References

When you are building advanced Excel applications, you occasionally need to call upon logic or features provided outside of Excel. This is done by adding a REFERENCE to an external application or library, by way of the VBA Editor's Tools menu.  Just about every application on your computer shows up here, enabling your code to leverage highly specific functionality such as enhanced user interface elements, or internet data services.

In a previous post I explained how to get Outlook to send email from Excel. The code to do so requires a reference to the Microsoft Outlook 11 Object Library. A recent application I created does this, and also uses several other libraries, including an HTML Browser control to preview PDF files, an Access database, and has a set of Custom UserForms - all requiring project references.  Each one of these represents an actual file - usually a DLL - installed and registered on the host system.


This makes Excel incredibly powerful.  But using this mechanism can complicate the deployment of your software, because you can't always control what is or isn't installed and registered on the end-user's computer. This can also be a problem if you are a stand-alone developer, but need to move your work between home and office machines.  If a referenced DLL or program is not available, you end up with a Broken Reference.

Another coder, Andrew Roberts, blogged about this very problem, and provided a technique to programatically re-add broken references, (for Outlook, in his example.)  It's a good fix, but has limitations.  For one, it depends on a specific Excel security setting to work, which is  probably not something you're going to want to ask 30 novice operators to change on 30 different computers.  And if your project code is password-protected, this approach may not work at all.


Protected Project and Protected References

In my project, I am using the Redemption library for Outlook.  This toolset allows VBA to manipulate Outlook objects quietly, in the background, instead of requiring the user to approve a series of popup warnings shown as part of Outlook's security protocol.

Not all users need this extra level of functionality, so I wrote some code to "ask for Redemption" - users who do not install this extra software should not be show-stopped by a broken reference.  The program should instead default to the standard Outlook experience.

I ran into a major snag: the VBA code that allows you to view and verify project references doesn't work when your VBA project is password-protected - unless the project is open in the VBA editor!

My code needs to be password-protected because of the nature of my programming work - I program for profit.  I don't want every employee at my client's location to freely view and copy the code I have written - this is my sweat and blood, and I try to maintain at least an illusion of copyright protection on it.  I also don't want operators to change or hack it.

Still, some code I share freely: here's a routine you can call from your workbook's open event to programatically inspect the project reference collection.

Sub reftest()
    
   'which references are visible and/or broken?
    Dim chkRef As Variant
    
    Debug.Print vbCrLf & vbCrLf & " - Reference Check -"
    
    For Each chkRef In ThisWorkbook.VBProject.References
      Debug.Print chkRef.Description & "  " & _
      Choose(chkRef.isBroken + 1, "", " - broken"
    Next

End Sub


In the VBA development environment, it all works wonderfully. All the references I added showed up.  But if I close Excel, and then re-open the model, (which calls this routine from it's Workbook Open event,) the references I added to the project become invisible to the code.  All you see are the 4 standard references that all projects have:


VBA   Visual Basic For Applications
Excel   Microsoft Excel 11.0 Object Library
stdole  OLE Automation
Office  Microsoft Office 11.0 Object Library


My code can't tell if the Redemption reference is broken or not, and so can't handle the situation gracefully.

I believe Excel hides the extra References by design - these are part of your code's functionality and overall design strategy, which is the I.P. that your trying to protect by putting a password on it.  However, since it's the code "asking", not a person via the IDE, I think it should be possible to enumerate the full collection of program references.  This would allow the kind of self-diagnostic logic I'm looking for.

I'm not sure what the work-around is for this at present - I'll probably end up using some sort of in-function error trap.  I'll report on my solution in a future post.  I'm open, as always, to suggestions via this blog's comment facility.

One of the really good things about Microsoft Windows and the Office programs is the ability to interoperate and exchange data between applications.  If you want to ensure portability and longevity of your code, just don't get carried away adding 3rd party libraries.