Friday, November 26, 2010

SQuirreL away that data

The most useful language any programmer should know is T-SQL - structured query language.  When you have a solid understanding of relational database architecture and the tools to manipulate it, extracting handy information from your business data becomes almost effortless.

Relational DB systems abound: Access, SQL Server, Oracle, MySQL and PostgreSQL are a few examples.  You can even use SQL to intelligently compile data right out of your Excel worksheets.

As macro developers, we often rely on canned reports produced by corporate systems to feed us data.  This is usually in the form of .csv text files, web scrapings, or stacks of steaming, reeking workbooks that take hours to recalculate.  Sometimes, especially if data volumes are large, moving everything into Access and pulling it back into Excel using SQL Select statements can be the best way to carve out the specific information required.

In a recent project, a client was mining several years worth of member profile data to compile statistics and spot geographic trends.  The prototype system, using a small subset of data and a few pivot tables, appeared to get the job done.  But the workbook was already topping 20MB in size - a real pain to shuttle around.  Then the full data set accumulated to over 65,000 records - more than the number of rows in an Excel 03 (client's spec) worksheet... time to step up to MDB.

I massaged the data in pieces in Excel, imported it into an Access database, and was stunned to find the .mdb file to be only 3.2 MB in size!  That's sweet compression.  Smaller than my prototype workbook was, zipped, but holding 20 times the data.

Next, do away with pivot tables.  Nothing personal. I actually think pt's are a pretty good idea, but they can be unwieldy, and many end users fear them.  Programming them takes you into the outer wastelands of VBA - sketchy knowledge for most of us, (but examples are out there,) and getting exactly the result you need might not be possible - this has been my experience with most of Excel's special power features.

I think SQL is considerably more powerful, and a lot less complicated.  With a simple* piece of code I can build a mini pivot table that doesn't pivot, because it's already pivoted to what the user wants to see.  Here's an example:

SELECT TOP 10 ZipCode, AVG(Revenue) 
FROM MemberView GROUP BY ZipCode ORDER BY 2 DESC

This gives us a nice little Answer, suitable for publication in this years Annual Report - a breakdown of the top 10 zip codes by average revenue.

The key to this is setting up a View, which is another more complex SELECT query that joins data from different tables.  This View can also pre-compute some answers, convert ugly binary 1's and 0's to pleasant things like "Yes" and "No", and limit the data set as needed.  It sits in the DB and looks just like another table.  This brings all the data we need to a single point of light.

I wanted to let users pick the field to zip-rank.  For this I linked a drop-down control to a list of cells containing the fields in my View, and the report macro simply subs the user choice into the Select statement.  With this dead-simple, pre-pivoted table maker, users can't mess up.  Instead, they grind out report after report, all day long, grinning like madmen.

Doing some of the heavy lifting in the database itself means the macros and formulas can be much simpler, and the JET database engine can crunch the numbers a lot faster than Excel anyway.  I love that my Business Reporting System workbook is only 100Kb.  Not having data to lug around means macro updates and bug fixes don't impact the client, which helps keep me off-site.  When the data set scales up even further, I'm well positioned to migrate to an enterprise system like SQL server, with minimal impact on the Excel code.

In a Previous Post I showed you some VBA code to support Access DB connectivity and  pulling data using SQL Select statements.  This should be enough to get you started.  I'd love to hear about any nifty tricks or nasty pitfalls you encounter.

Wednesday, October 20, 2010

Shrapnel

I've begun to migrate with some of my clients to the 3 year old Excel.  I'm well past the whole Ribbon controversy - yet I still get a sense that the new ship is still a bit leaky.  Office has thrown a few curve balls at me, so here are some cheap and effective tidbits of advice I can pass along:

Don't... 

...create pivot tables in Excel 2007 and then save the file as a 2003 compatible .xls - it all goes to hell.


...try to draw a flow chart in a Word 2007 document using AutoShapes.  Visio is better if you have it.


...run untested Excel VBA code that references the MS Word object model without setting up error handler's first.


...insert a PDF object into a document that may end up being printed to PDF.


...dispair.  No matter how low the economy goes, businesses will always have spreadsheets.


Cheap and Effective

I've been working on a B2B application - a variation on the standard "request for quote" problem.  In this case, a request is assembled from data on the spreadsheet, dropped into an email, and sent to a slew of individuals somewhere who may not even have Excel (poor sods.) They hit reply, fill in some sections of the copied message, and send.  Another macro connects to Outlook, scans a particular in-box, and pulls the details from any messages found as they come back.

This is, like my advice above, cheap and effective.  It's a way to do email forms with no workgroup software or exchange programming.  It doesn't involve sending or sharing files, and is not limited to the corporate LAN.  Data can even be collected electronically from people who aren't sitting at their desk, working, like I am always.

Hey, isn't this what websites are for?  Um, yeah, but this approach bypasses the need for website programming - no offence to web programmers (since I used to be one,) but power macro artistes deal in turn-around times of hours, not weeks, and the data needs to get back to Excel in the end anyway, right?  
 
Excel talks quite nicely to outlook.  Here's some code that sends an email message - it's really dead easy as long as you set up the appropriate references.  Perhaps the next post can explore the response capture side of things.

Public Function SendAMessage(strSubject As String, _
         strBody As String, _
         strAddress As String, _
         Optional strAttachment As String = "") _
                As Boolean

 'strAttachment, if used, will be the full 
 'file path of a file
 'requires a reference to the Microsoft 
 'Outlook 12.0 Object Library
 'see Tools Menu - References
  
  Dim email As Outlook.MailItem
  
  On Error GoTo Fail
  SendAMessage = False

  Set email = Outlook.CreateItem(olMailItem)
     
 'make sure we've been passed sufficient info
  If Len(strAddress) = 0 Or Len(strSubject) = 0 _
  Or Len(strBody) = 0 Then
  
    'trigger an error
     Err.Raise 999, , "Missing Information"
     
  Else

    'assemble the message
     With email
        .Subject = strSubject
        .To = strAddress
        .Body = strBody
        
        'attachment if required
         If Len(strAttachment) > 0 Then
           .Attachments.Add strAttachment
         End If
        
        .Send
        
     End With
     
    'it worked
     SendAMessage = True
  
  End If

Cleanup:
  On Error Resume Next
  Set email = Nothing
  Exit Function

Fail:
 'it didn't work
  gGlobalError = Err.Description
  Debug.Print "SendAMessage Failed. " &_
               Err.Description
  Resume Cleanup
  
End Function

Thursday, September 9, 2010

Back to Business

This post is so long overdue I'm almost ashamed. If I'd had the entire summer off, that would be one excuse, but the real reason for the delay was that I've been too busy writing code to write anything else.  Okay, I will admit to enjoying a reduced workload in July - 20 hours instead of my usual target of 120!  In any other month this would be a catastrophe, but I try to keep July work-free, as a personal reward for the year's efforts.  That's one of the principal advantages of being self-employed, after all.

As mentioned in a previous post, I landed a return engagement with the company for whom I created an Excel front end to an Access database.  Another department within the company saw how effective the first application was, and hired me to build something similar, to handle their Purchase Order processing workflow.  This has kept me busy for all of August.

The business logic was a lot more complex for this project, but I was able to re-use quite a bit of the architecture from the previous application: User Account Management, database connectivity code, list filtering and sorting... There's nothing like the efficiency and confidence you get from writing code around a proven framework: instead of messing with the fiddly bits, I was able to focus on adding business functionality and making the software friendly and usable.  Client's don't generally get excited to learn that the TCP/IP bottleneck has been optimized, but they are always thrilled to hear about the one-click report engine you've designed!  This project is currently in testing, and so far, the reception has been 100% positive.  I aims to please.

As a challenge to myself, I decided to do this development entirely in Excel 2007.  I usually do my work in 2003, since I find that version of Excel more responsive and pleasant to use.  If the client will be running 2007, the "xls" file from Excel 03 usually just works.  But this time I opted to skip 2003 altogether, to see what would happen.  To my surprise, I found I was able to make the transition with very little pain and suffering.  

It took me a few days to really get the hang of using the Ribbon - I still find myself hunting around for some Excel features, but the learning curve was not as steep as I had predicted.  The benefit is that the file is running "native" in Excel 2007 - no "compatibility mode."  I'm not sure if this is a real advantage (other than looking cleaner on the Excel title bar), but at least I'll know that, if some strange bug or behaviour pops up, it's not an artifact of the file conversion process.  More importantly, I am now more efficient using and interacting with Excel 2007, and feel comfortable porting my personal spreadsheets and Client work to this platform.

Looking forward to the Fall, it appears I will continue to be very busy for the foreseeable future.  Since starting out as an independent consultant 4 years ago, I have been slowly building my roster of clients.  Many projects are one-off, where I go in, figure out what needs doing, write the app, and that's more-or-less the end of the engagement.  But several of my clients are power Excel users who are constantly looking for new short-cuts, utilities, or techniques to apply to their daily business activities.  

As I add more and more of these types of clients to my roster, I'm given an increasing load of "casual" hours - short fixes, mini projects, and extension of existing work.  This ad-hoc stuff is taking up enough of my time that it effectively fills in the gaps between bigger projects.  It has taken a few years, but now I have the business humming along perfectly - working enough to pay the bills, but not so much that I'm overwhelmed.  If that starts happening, I'll just start raising my hourly rate.  Everything is going according to plan.

Tuesday, June 29, 2010

Applying the Application Object

VBA coders who want do do more than automate the basic Select, Copy and Paste functions of Excel eventually discover the object model, and the world of possibilities it brings.  One object that is sometimes overlooked is Application.  Here are a few examples of how we can use the Excel Application object in our VBA code.

One of my personal favourites is Application.StatusBar.  The status bar is the strip of real estate at the bottom of the Excel window where you see system messages and so forth.  Programmers can write to this area, which can be an intuitive and logical way to keep your users informed about what your macro is doing.  As explained in a previous post on Speed Optimization, if your process is long and involves a loop, periodic updates to the status bar can tell the user how far along the process has advanced.  I usually use a percent complete message, as exemplified in the following code:


For idx = 1 to intNumRows
   {... code does something here....}
   Application.StatusBar = "Running... " & _
   CInt((idx/intNumRows)*100) & "% complete"
Next


Given this feedback, the user can predict how long the macro will take to run, which actually makes it appear to run faster.  More importantly, users can see that SOMETHING is happening.  Its also nice to finish up with a parting word...


Application.Statusbar = "Calculation Complete."


Another really important use of the Application object is the WorkSheetFunction method.  This allows you to execute any function that would normally only work when placed directly in a worksheet cell.  For example, say you had a VBA process that needs to take an average of a bunch of cells.  You could loop through them and add each cell's value to a running total, then divide by the number of items, or you could use the built-in AVERAGE function, as in:


varResult = Application.WorksheetFunction.Average(rngRange)


I ofetn use this approach when trying to find the position of a value in a list, because the MATCH worksheet function has no equivalent in VBA.


intPos = Application.WorksheetFunction.Match(varVal, rngRange, 0)


What if you have written a custom function and you need to know what cell that function was entered into? This is important, for example, if the function must refer to other values on the same row or in the same column.  For this, we use Application.Caller, which returns a RANGE reference to the cell where the function was "called" from.


Function MyFunction(N as integer) as Integer
 'use whatever is in the Nth column of the row in which
 'this function is placed as an argument for another function
  intRow = Application.Caller.Row
  MyFunction = MyOtherFunction(Cells(intRow, N))
End function


Another very powerful method is Application.Run.  This allows you to specify the name of another macro as a text string, and execute it, along with up to 30 arguments.  Why do we need this?  Let's say we are working on a model that encodes a large set of business rules.  We might have dozens of macros that do specific things depending on what may be contained in the data.  Using the Run method, we can assemble the name of our macro in code based on changing data conditions, and then run the code.  I have used this in situations where macro names reside in a database, allowing me to change the behaviour of the application by modifying data.  This facility comes close to enabling Self-Modifying Code, which any old-school programmer will tell you is the holy grail of power in any language that allows it.

There are many many more powerful features available to the VBA programmer by way of the Application object.  Just open your Code Editor and type "Application.", and observe the miriad of methods and properties that the IDE exposes for you.  With a bit of experimentation I guarantee you will discover a way to do something you never thought possible in VBA.  Random code exploration is the gateway to advancing the art.

Thursday, June 17, 2010

Excel in Front, part 2

Project Update: I'm happy to say that so far the Excel Front End project has been as smooth as glass.  No only were the client's expectations exceeded in every regard, but my personal experience writing the application has been immensely satisfying.

If you recall from last post: I was hired to design and build a work-order tracking system for a manufacturing facility, using Excel as the front-end to an Access Database.  When I set out on this project, the main challenge was that the client's shared workbook approach was causing a lot of trouble, and undermining their confidence in the data.  They asked for my help, as an Excel consultant, to sort out the file sharing issues and build in some business logic and data validation rules.  

I proposed two solutions: keep the shared workbook but apply sharing permissions effectively and write the required macros to ensure data integrity, or the more involved, expensive option: move the data to Access and create an intelligent front-end in Excel so users can interact with it.  They opted for the Access option, understanding the scalability advantages, and the cost savings they will achieve when the time comes to build out some of the other data management applications they need.

I developed a vision for how the system should work: no shared workbooks, no linked files, no open connections, and especially: no Access, as in, no developing in Access (because, well, it's crap.  Sorry, but I hate it.)  I have linked Excel to Access before, so I knew how to do it, but I didn't know if this architecture would perform acceptably with this particular set of business logic and data until I actually built it.   

Typical client-server apps use a persistent database connection.  This is fine when using an Enterprise RDBMS such as SQL Server, but in a small-scale deployment using an .mdb file, I just knew that trying to simultaneously connect 20 or so users to a shared Access db would be a recipe for failure.  So I designed the system to interact with the database in instant-batch mode: need some data? Connect to the db, fetch the data into memory, disconnect from the file, then display the formatted results.  This way, no single user has an open connection to the access file for more than a fraction of a second at a time, virtually eliminating any chance of network traffic bottlenecks, record-locking stalemates, and (hopefully) database corruption.

The following are some key design points I employed:
  • Created a de-normalized view of the data, by way of an Access Query, which joins the related data tables to give a fully-qualified view of all Work Orders in the system.  This Master Query becomes the main Result Set that the Excel side queries from.
  • Rather than code separate reports for all the different needs of the client, coded a Report Engine that can be configured as needed, enabling all current and future reports to be designed without programmer intervention.
  • Matched the field names in the Master Query to the column headings used in Excel.  This allows the user to build their own filters and searches by selecting from known field-names.  
  • All user-definable settings are stored in the database, keyed to the user's login.  This way, when the Excel front end file is updated, it can reload all the user's settings and they can continue to work without any setup time.
  • Used generic database SELECT and EXECUTE routines, so all db-related code passes through a single gateway.  If we later to decide to change our Database server, we won't have to search through hundreds of routines to change all the affected code.
  • Captured cell-edits using the Worksheet_Change event.  Updates are validated, and if allowed, instantly written to the database.  Cell formatting and locking are applied as the data is displayed, so the users know (and are only allowed to edit) the editable parts of each record.
Amazingly, this project came together without a hitch.  The incredible flexibility and programmability of Excel enabled me to build every requirement the client asked for, and since I didn't get bogged down searching for bug workarounds and advice, I had time left in the budget to add some really sweet features to the application.  Next Tuesday we deploy the app to the users, and they are really looking forward to the improved efficiency and productivity advantages they will be getting.  And the client has asked for help with a second project for another division in the company, which is obviously very welcome: follow-on work is a sure sign you're doing a good job.
Unfortunately I can't share many more details about this application because I have signed a non-disclosure agreement, which binds me to keep any client-specific or business-specific information secret.  But here is some code which I have used in this and other projects: I keep this in a code module called "DBConnectivity". It is offered without warranty, but it might come in handy for anyone attempting a project that connects Excel to an Access database file.  This code references the Microsoft DAO 3.6 Object Library, so you will need to create a reference to that library in your project.



Public DB as Database

Public Function OpenDBConn() As Database
     
  'Opens the database connection using global db object
     
   Dim strDBName As String
   
   On Error Resume Next
   Err.Clear
   
   Set OpenDBConn = Nothing
   
  'get DBName from cell on main worksheet
   strDBName = Range("DBPath")
   If Len(Dir$(strDBName, vbNormal)) = 0 Then
    gglobalerror = "The database could not be opened. " & _
          "Please select the database file on the " & _
          "Main Tab and try again."
      Exit Function
   End If

   Set OpenDBConn = DBEngine.OpenDatabase(strDBName)
   If Err.Number <> 0 Then
      Set OpenDBConn = Nothing
   End If

End Function


Public Function SQLSelect(ByVal strSQL As String, _
       ByRef sList() As String) As Integer

'Execute the passed SQL (strSQL) and return the 
'retrieved results in the passed array (sList)
'this method is coded for the Microsoft DAO 3.6 
'library.  modify this and SQLExecute
'routines if migrating to another data provider, 
'such as SQL Server.

  Dim rsList As Recordset
  Dim iColCnt As Integer, iRowCnt As Integer, 
  DIM idxRow As Integer, idxCol As Integer

  gGlobalError = ""
  On Error GoTo Err_SQLSelect

 'try db connection.
  SET DB = OpenDBConn
  If DB is Nothing then Goto Cleanup

  Set rsList = DB.OpenRecordset(strSQL, _
                                dbOpenSnapshot, _
                                dbReadOnly)

 'check for no data
  If rsList.EOF Then
     SQLSelect = 0
     GoTo Cleanup
  End If

 'Size the array
  iColCnt = rsList.Fields.Count
  rsList.MoveLast: rsList.MoveFirst
  iRowCnt = rsList.RecordCount
  ReDim sList(iColCnt, iRowCnt) As String

 'store recordset rows in array
  Do Until rsList.EOF
    idxRow = idxRow + 1
   'Load values for the row
    For idxCol = 1 To iColCnt
      If IsNull(rsList.Fields(idxCol - 1)) Then
        sList(idxCol, idxRow) = ""
      Else
        sList(idxCol, idxRow) = rsList.Fields(idxCol - 1)
      End If
    Next
    rsList.MoveNext
  Loop

SQLSelectExit:
  SQLSelect = UBound(sList, 2)

Cleanup:
  On Error Resume Next
  rsList.Close
  Set rsList = Nothing
  DB.Close

  Exit Function

Err_SQLSelect:
 'On Error Resume Next
  gGlobalError = "Select from the database failed!  " & _
    " & Err.Description & vbCrLf & " _
    SQL: " & strSQL
  Debug.Print gGlobalError
  SQLSelect = -1

  Resume Cleanup

End Function


Public Function SQLExecute(ByRef strSQL As String) _
 As Integer

 'Execute the passed SQL statement contained in the 
 'string strSQL
  Dim strErr As String

  SQLExecute = -1
  gGlobalError = ""
  On Error GoTo Err_SQLExecute

 'try db connection.
  SET DB = OpenDBConn
  If DB is Nothing then Goto Cleanup

  DB.Execute strSQL, dbFailOnError
  SQLExecute = 1

Cleanup:
  On Error Resume Next
  DB.Close

  Exit Function

Err_SQLExecute:
  strErr = Err.Description
  gGlobalError = "SQLExecute failed!" & _
       vbCrLf & "SQL: " & strSQL & " _
       Err:" & strErr
  Debug.Print gGlobalError
  SQLExecute = -1

End Function


Monday, June 7, 2010

Excel in Front, part 1

This post is long overdue - I've been flat-out, coding a very interesting application for a manufacturing client, using Excel as the front end on a shared Access database.  The goal is to give a workgroup the ability to perform simultaneous edits to their work-in-progress tracking data, without any of the hassles, bugs, hideous file growth, sharing violations, lock-outs, etc. that you get with the built-in workbook Sharing mechanism.  There's a host of business logic involved, and they want the ability to design their own reports at will.

Again, I find myself reinventing the wheel.  "Why not develop your application in Access?" one might ask. My reasoning: Customer didn't have it, doesn't want it, can't support it, and more importantly, developing in Access is painful.  I prefer Excel.  The advantages weigh in: custom business logic is more easily coded in Excel.  It is.  And we needed in-grid editing, flexible data presentation, filtering, sorting, graphing... and whatever else we can dream up for versions 2 through N.  That's all in my Excel quiver.

"Oh, right. Excel lets you Link to a data source and populate tables..." yeah, I thought about it, but the users don't want to fart around with Data Connections, and my experience tells me it's a long road coding for those APIs. (if you know better, please tell me.)  And I wanted this to be an always-disconnected setup, where the workbook simply grabs data or sends updates to the db in quick bursts - no live links to network files that might be unavailable, renamed or changed.

The db was designed using an SQL Script to create the tables, populate the lookup data, and set up primary keys, defaults, and so on.  I ran the script to create the database in SQL Server, and then imported the tables into my .mdb file using Access.

I did not use the Access database design tools, though you certainly can.  I prefer to use a build script whenever designing a relational database; you drop and recreate the db often as you rework your design - that's Agile.  I also find that Access is a bit too user-friendly, in other words, not powerful enough.  I guess I can type SQL faster than I can ponder through menus.

In Excel, I set up a file-select control, and some text boxes for username and password.  The users on the network will each have a local copy of the application workbook - It's not linked, welded, cross-referenced or otherwise umbilical'd to any other file.  Can't you smell the freshness?  No pop-ups... ever.  As updates come out, they will be emailed to the users.  They will enter their credentials, file-pick the .mdb file, and work. That's Data Separation.

Here, again, I'm not tying into any built-in user management.  In a secure enterprise situation you probably have to set up database user accounts linked to network id's, but my client is looking for (and I strive to offer) the simplest solution wherever possible (SSWP), so we won't be contacting network support for domain credentials and such - the .mdb file is on a password-secured network drive.  Our application will maintain it's own user list... VBA code does the rest.

The project is still underway - next post I'll explain how I set up my object classes, control cell-editing, filter and sort, and conclusions: will it be another home run, or a teetering, reeking, house-of-cards? Stay tuned.

Tuesday, May 18, 2010

Condition "L"

Back in 1989 while using Lotus 1-2-3 in college, printer quality was not measured in dots per inch, but in dots, as in, I just got a new 9-dot printer.  A Lotus add-in called Sideways let you print your spreadsheets in landscape mode by rendering the output as a picture, scaling it, rotating it, and printing it out 9 dots at a time.  This was a major breakthrough for me, because suddenly my spreadsheets were transformed from IBM, holes on the side, six-page, Utility Bills into one-page, big-picture, Annual Reports.  Design meets Computation / Left meets Right - it all made sense to me, and was the beginning of what is shaping up to be a life-long mania for the idea of the spreadsheet.

Forward in time: it's post-2007 and Excel, in version 12.0, gives us complete control over every aspect of the look and layout of our reports. But now the screen is where we consume the majority of our spreadsheet salad; what was a static page is now dynamic.  One way to exploit this trend is to let the data control it's own appearance.  As new numbers stream in, (conditions change) the visual appearance of the numbers also change, allowing our eyes to zero in on the Information hidden amongst the Data.

Excel's Conditional Formatting feature helps us achieve this.  Though somewhat limited in 2003, the 2007 version of Excel has greatly expanded the ways conditional formatting can be applied, and the user interface has been improved dramatically.  (This is, perhaps, the only actual improvement in Excel 07, but that's another topic entirely.)  Regardless of what version you're using, it is the "Formula" option where the power of this really comes through.

For those who haven't tried it, conditional formatting lets us apply a format to a cell that is only visible if a certain condition is met.  The interesting thing is, the actual condition doesn't even have to be directly related to the cells in question.  It's one thing to say "If the value is negative, make it red.", it's another to say "If the Region is North East or North West, and the current month is between November and April, highlight the "Salt" expenditures in Blue, otherwise in show them in Yellow.  In this second example, the salt expenditures are a dollar amount, but it's the intersection of several other factors that controls how the number will appear.  That right there is pure formatting power.

The secret to correctly applying a Formula-based conditional format is knowing how relative vs. actual cell references affect the condition, and this is only a factor if you're applying the format to a range of cells.  One tip I use is to plan my format formula as though I'm writing  it for the first cell in the range, and intend to copy it to the rest of the cells - because effectively that's what Excel is doing.

For example, say I wanted to highlight every other row of cells, to give that retro altenating green & white striped computer printout look to my report.  In Excel 2003, select the entire report, and choose Conditional Formating from the Format menu, and select "Formula Is..." in the first condition's drop-down.  In the formula field, enter an IF function to evaluate the current cell's row number, like so:

=IF(MOD(ROW(A4),2)>0,TRUE,FALSE)

And then select the formatting you want - a light green cell background in this case.  Notice that we're not actually testing the data in the actual cells, but just the row numbers of the cells.  The reference to cell A4 in this case is a relative reference to the first cell in the table.  After saving the format, if I select any other cell in the table and inspect it's conditional formatting, that cell's address should appear in place of the A4.  If not, Excel might have substituted a named range or something, which is equivalent to an absolute reference - no good, because this doesn't "move" with the formula.

Using this formula, we won't disturb the alternating row coloring if the data is re-sorted or we remove or add rows.  And using this approach, along with the boolean functions AND and OR, there's no limit to how many different conditions we can include in the formatting rules for a given cell.

In Excel 2003 you ARE limited to 3 distinct conditional formats, or looks, for a given cell, plus the default format when no conditions are true.  Unfortunately, the different visual effects are not cumulative, so you will only see the formatting for the first condition of the three that evaluates to True.  No so in 2007 - you can display the first applicable format OR indicate that the rules to be combined.  For example: a value exceeding the Budget will be red, and an item covering more than 10% of the overall total will be bold, and forecasts have a blue background but actuals are green.  One glance tells me where a given value lies in all three of these axis.

In Excel 2007 you can also define up to 5 distinct looks or formats for a cell, and you get a lot more fidelity over the order and way the conditions are evaluated.  You also get expanded formatting options - specifically, the ability to create sliding color scales and to highlight unique or duplicate values are both very cool additions.

If you haven't yet mastered the conditional formatting power of Excel, I encourage you to dive in and start experimenting.  I would also love to hear about any interesting c.d. tricks you have gleaned.  Happy formatting.

Thursday, May 13, 2010

iPhone Prototyping in Excel?

I just completed what has to be the funniest project I've ever worked on.  Not a typo, I really mean Funny, as in I was laughing my ass off the whole time.  This is a departure for me, as I'm usually working on financial data (lately: not so funny), and like I've said before, there's no office banter around here.  But this particular gig amused me because I was "skinning" a workbook to appear on a Television Show, and the designers who supplied me with the graphics really nailed it.

The application is sort of secret so I can't say too much about it just yet, but the TV Show, on Canada's CBC network, is called Dragon's Den.  On this "reality" program, entrepreneurs pitch their ideas to a panel of wizened business executives - Dragons - who show no mercy and pay no compliments.  Occasionally, someone impresses one or more of the Dragons, and they literally "buy-in" to the idea.  Usually, they cut the hapless entrepreneur down and send him or her off, humiliated, (or worse: try to gain full control (51% ownership) of the start-up for a fraction of the asking amount.)

My Client will be appearing on Dragon's Den, which is taping next week.  He has invented an algorithm that he intends to evolve into an iPhone applicaton, among other things.  For now it resides as an Excel model, involving some reasonably sophisticated statistical analysis and math.  My mandate was to make the model presentable on the show.  Specifically, I needed to make the spreadsheet's input screen look like an iPhone app, so my client can get the actual Dragon's themselves to provide the input data.  This is a one-off re-skin, created just for this program, with the Dragon hosts depicted with oversized heads... it's going to be really funny when it airs.

My approach was to overlay a set of masking graphics over the input section of the spreadsheet, which contains a series of Drop-down list boxes.  Anyone interacting with the file should be able to select values in the list boxes as if doing so on an iPhone.  This design simultaneously proves the technical side (the model is running in Excel and it calculates... this!), while at the same time putting the idea into context - a portable application that you use... like this. Brilliant! I wish I'd thought of it.

It was easy enough to find the elements I needed: a hi-rez PNG file of a hand holding an iPhone does the trick.  You then need to slice it up into seperate pieces so you're removing the phone's screen - that's where your spreadsheet's elements will show through.  A transparent section won't do, because you'll need to click on the controls under it.   This graphic looks really good on a spreadsheet who's cells all have a black background.

Some traps: the graphics move around and stretch as you adjust column widths and such, so you'll want to anchor those down to keep them nice.  This is done by right-clicking, selecting Format Picture... PropertiesTab... "Don't Move or Size with Cells".  Excel is a bit buggy in that it can lose this setting if you group, copy or otherwise modify the graphic, so I wrote a little routine to flip this on and off universally.  This came in handy as I was laying out the screens.

Sub float_All_Objects(blnFloat As Boolean)
  
   Dim ctl As OLEObject
   Dim shp As Shape
   Dim plcmt As Variant
   
   On Error Resume Next
   
   plcmt = xlMove
   If blnFloat Then plcmt = xlFreeFloating
   
  'form controls
   For Each ctl In ActiveSheet.OLEObjects
       ctl.Placement = plcmt
   Next
   
  'shapes, meaning graphical elements
   For Each shp In ActiveSheet.Shapes
       shp.Placement = plcmt
   Next
   
   Set ctl = Nothing
   Set shp = Nothing
   
End Sub


Another issue is that you cannot change the font size in the standard screen controls (list boxes, buttons, etc) so if you need to scale your worksheet's zoom, (to make it readable on TV, for example,) the default font size is usually incorrect.  Instead, you must use oleControls - the difference is these controls require VBA code to interact with, but they also give you full... well, control over their property set, including size, font, source data, 3-D appearance, and a bunch more.

Many Power Users get these control types confused.  It's all about which tool bar you get them from, as this illustration illustrates:  Basic form controls come from the Forms toolbar, but oleControls come from the Control Toolbox.

It's not obvious how to configure the ole controls after you have placed one on your spreadsheet.  When you right-click on one, all you can do is control basic properties such as placement, border, and caption.  For more advanced properties you need to use the Visual Basic IDE.  Your worksheet must be in Design Mode so you can select the control, and then you may edit it's properties on the VB side.  One warning, these controls won't work at all if your user's fail to enable Macros, even if you haven't written any actual code.

I really enjoyed this challenge: make the model interactive, while completely removing any hint of "spreadsheet" so it doesn't beguile the television audience.  When the segment airs in a few months, you might not even realize that Excel is running the demo, which is just one (more) of the reasons why Excel is the ultimate prototyping framework.

Friday, April 30, 2010

Parallel Odyssey

As I have already mentioned in this blog, I'm a fan of virtualization.  Specifically, running Windows XP on my MacBook Pro.  I've had a relatively good experience using Parallels 3.0, but I have resisted upgrading it.  I went through quite a bit of pain during the early days because the software was buggy and the company was still ironing things out.  I stuck with it, reverting to a few snapshots and re-installing once, but eventually got a stable instance of Windows XP that I use for Excel development.

The company that makes Parallels bombarded me with offers to upgrade to version 4, but I resisted.  Like my steadfast loyalty to Excel 2003, I saw no reason to upgrade for upgrade's sake - the risks are just too high.  The net was full of postings from people who suffered through bitter difficulties getting their upgraded virtual machines to boot correctly; to be stuck without a usable Windows environment means I am unable to do billable work.

Finally, with version 5.0 a mature product, and 50% discounts being thrust at me, along with promises of huge performance increases and support for Windows 7, I decided to take the plunge.  The notion of having more speed - "up to 3 times faster than previous versions" - sealed the deal.

I downloaded the installer for $30, but parked it for a couple of days, because I had some deliverables to complete and had no idea how long the machine would be out of action.  Here's what happened:

Thursday, 2:30pm - finished my work for the day, time to start backing up.  My virtual machine's directory is 109GB.  Copying this to my network drive takes the better part of 2 hours.  I watch a movie on my other computer.

4:30pm - I use the Parallels VM manager to "shrink" my Windows disc.  First I need to get rid of the snapshots I've created over the past two years.  Deleting a snapshot is not a simple matter; each one is "delta" over the ones that precede it.  I had 10 to delete, at around 10 minutes each!  What a pain.

6:30pm - Snapshots gone, I initiate the Shrink operation.  The progress bar moves at hour-hand speed, but no e.t.a. is given.

9:00pm - Kids are to bed, and the shrink operation seems to be hung-up at around 80%.  I debate interrupting it, not sure if it's frozen or what.  I put my ear to the computer and detect some faint noises coming from the hard disc, so something must be happening.  I leave it overnight.

Friday 8:00am - To my relief, I'm greeted with a "process complete" message.  I check the folder size and it has gone from 109 down to 63 GB.  That's much better, but still too big considering that the actual data on the VM's HDD is only 20 GB.  What is up with the missing space?  Better read up on what others are doing about this.

9:45am - Educated, I launch the Compress utility from within the guest OS, which removes all the chattel that builds up in a typical Windows install: temporary files, logs, etc.  It also compresses the data.  This program displays a progress bar, which also creeps.  I go for a bike ride while it grinds away.

11:30 - still not done, but it's definetly doing something - the laptop's cooling fans are at maximum and the laptop is hot!

12:10pm - Compress has completed, and to my elation the VM's folder is now a paltry 18 GB!  That's an 83% reduction.  More importantly, that frees up 30% of the space on my MacBook's hard drive, which was nearly full.  Phew!  I make a backup of the folder again, which only takes 20 minutes this time.

12:30pm - I start the Parallel's 5.0 upgrade.  I'm given a series of warnings about how it will need to update my virtual hard drive image, and once done, I won't be able to use this on older versions of Parallels, so basically it's a one-way trip.  Having fully backed up everything, I click Start.  The installation only takes a few minutes.

12:45 - Parallels 5 launches, and it actually boots my Windows XP machine!  But wait - this must now be updated.  I get the old progress bar again... slow like stalagmite.  It gets stuck at around 50% for two hours but I've come so far, I dare not intervene.

2:30 - I give up hope and start intervening.  I see a little padlock icon so I click it and am taken to manual mode - and I see that Windows is hung up on a retry-ignore-cancle dialog - I click "ignore" and the process resumes.  I thank my lucky charms.

4:18 - It's done!  Paralles 5.0 is installed, Windows XP boots up just like always.  My directory structure is intact, and Excel launches - nice and snappy!  The whole deal ate up about 6 hours of my actual time, but my development environment was down for 26 hours.  Oh well, it will all be worth it, right?

The first thing I do is load up my biggest, most complex Excel model and initiate it's comprehensive recalculation routine - a process I had previously benchmarked to take roughly 58 seconds.  It runs, but not apparently faster... I check the output and am stunned to see the result: 58.23 seconds.  WTF?

I don't know what performance increases those guys were talking about, but apparently it wasn't anything to do with typical user functions like using Microsoft Office in Windows XP!  Time will tell if I see any measurable benefits from this upgrade.  At least it wasn't slower!

One thing is for sure: when I replace my laptop, I will need this version of Parallels, because 3.0 doesn't run on the latest Mac OS, so this upgrade had to be done eventually.  But I'm still a bit disappointed - I was hoping for more of an immediate payback.  If anything, this experience reinforces one of my primary credos: if it ain't broke, don't fix it.

I'd love to hear about any of your experiences on this topic.

Tuesday, April 20, 2010

Custom Forms: File Chooser

If you are developing solutions using Excel and VBA, you will probably come across the situation where your user must navigate to a folder and select a file from the OS to feed some process.  One example might be a report generator that imports data from an existing workbook to produce a new report.  You can use this technique to provide a file picker for your users.

On a user form, place a text box (call it "txtFile") and a command button (call it "cmdPickFile").  Make sure the text box is large enough to display the directory path and file name your user will select, and put the command button "inside" the right edge of the text box, like the Import File field on this dialog:

Set the command button's caption to be an elipse (3 dots) which is a standard and intuitive Windows construct indicating that clicking the button will allow you to select a file.

In the button's click event, put the following code:


Private Sub cmdPickFile_Click()
  Dim strFile As String

  On Error Resume Next

 'set default dir
  ChDir$ ThisWorkbook.Path


 'show file open dialog
  strFile = Application.GetOpenFilename("Excel Files (*.xls),*.xls", 1, "Select File", , False)


  If strFile = "False" Then Exit Sub


 'drop name in text box
  txtFile.Text = strFile

End Sub


This routine leverages Excel's GetOpenFilename method, which displays a standard File Open dialog, allows you to navigate through the directory tree, and select a specific file.  This doesn't actually open the file, it simply captures the selected file's name and path.

If the user clicks the Cancel button on the standard dialog, the GetOpenFilename method returns "False", so we test for that before capturing the selected file's info and putting it in our text box.

The advantage of this approach is that the selection is visible to the user, and could in fact be typed in, or pre-populated by our macros, depending on the situation.  The line that sets the default directory could be adjusted to make a pre-determined path the default.  Also, you can change the file filter ("Excel Files (*.xls),*.xls") to show other types of files, in case we need to process data that isn't in Workbook format.

In this example, the user must click the OK button on our user form to start the import process.  Our code simply needs to look in the txtFile control to get the full path and file name of the file we want.  This file could then be opened using code like this:


Dim WKBK as Workbook
Set WKBK = Workbooks.Open(txtFile.text) 


It may seem so simple, but unless you've done it, these sorts of details can be time-consuming to figure out.  It's worth doing, because intuitive, user-friendly controls such as this make your applications work the way user's expect, and this get you hired back.

Thursday, April 8, 2010

Home Spun Scenarios

Most Excel users are content with the incredible power that the basic spreadsheet model provides: named ranges, linked formulas, auto-fill, and so on.  When more in-depth analysis is required, power users often turn to one of the other significant Excel features that are often overlooked, such as Goal Seek, Pivot Tables, Outlining, and the Scenario Manager.

Today's post will discuss the Scenario Manger, and why it is better (and possibly easier) to avoid this feature and build your own scenario control logic.

The Scenario Manager provides a framework, allowing the average Excel user to define a set of input cells in a model, and specify a variety of possible inputs that can be substituted into those cells in order to perform "what-if?" analysis.  What if sales were 20% higher?  What if the price of Oil reaches $200 per barrel?  Any number of distinct scenarios can be defined, and the input data for each can be applied to your model to quickly compare different inputs.

Detailed instructions on using Excel's built-in scenario manager, with examples, can be found here on Microsoft's web site.  This feature is quite easy to use, but I personally believe that Excel's Scenario functionality is just too basic for most power users, and has one glaring disadvantage: you cannot see what scenarios have been created without drilling down into the Scenario Manger dialog, clicking the Edit button, and inspecting the substitute values (and obscure $A$1-style cell references) that have been defined.

A better approach can be achieved easily with a bit of consideration to how your worksheet is laid out, some in-cell validation, and the Offset function.

The goal is to have an easy way to quickly substitute a new set of data for our model's inputs.  This example drives a real-estate calculator (click to enlarge):


In this worksheet, the model's inputs are in column D.  Columns F through J contain alternate input sets - the scenarios.  These alternate sets of data can be substituted into the model by linking the input cells in column D to the scenario cells in F to J, using the Offset function.

All input cells in column D contain the formula: =Offset(0,1+ScenarioOffset), rather than actual data.  The ScenarioOffset reference is a named range pointing to a cell that evaluates which Scenario we have selected in the drop-down in cell D4, using the Match function like this: =Match(D4, ScenarioList, 0)  The ScenarioList reference points to the range of scenario names, which reside in cells F4 through J4.  To make this easy to use (and foolproof), I have added Data Validation to cell D4, so the user must select a value from a list, and the list coices come from... you guessed it!,  ScenarioList - the names of our scenarios.

To use this, the operator selects a value from the drop-down in cell D4: Low, Medium, High, etc.  Our Match function converts this choice to a number, so if we pick "Medium", we get 2 in the cell called ScenarioOffset.  This causes the Offset functions in cells D5 through D17 to change, selecting data from 3 columns to the right (ScenarioOffset + 1, to account for the empty column E.)  Changes to the inputs cause an automatic recalculation of the rest of the model.

Depending on how complex things are, you could write code in the worksheet's Change event to perform additional calculations, regenerate reports, and so on.  The key here is: by changing a single cell (the selected scenario), this changes the data captured by the set of input cells.

Using this approach has some distinct advantages over Excel's Scenario Manager:  It is very easy to add new scenarios by copying values from existing ones and changing the details that are different.  Also, each scenario's data is clearly visible, and sits nicely in-line with the input names (in column B.)  Managing scenarios becomes a simple matter of managing worksheet cells, rather than jerking around with a set of Excel dialogs and modifying data buried in the Scenario Manager's settings.  And if your model has dozens or hundreds of inputs, you will quickly see that the Scenario Manager is simply not viable.

Adding Scenario analysis vastly extends the usefullness of your worksheet models, giving your operators a powerful tool to compare and analyze any number of different possible parameters.  By building it yourself, you maintain control over the input and scenario data, and achieve the exact functionality you need rather than settling for the "good enough" built-in framework that Excel provides.  That's how Power Users roll.

Thursday, April 1, 2010

Easy Dashboarding

You can add visual pizazz to any report with a simple bit of logic and the Wingdings 3 font.

In this example, an expense statement compares one family's household spending to the standard working family.  There are several items which could be lower, higher, or the same as the standard ratio.

Showing the numbers side-by-side is an effective way to highlight this information.  Adding a graphic goes even further, literally illustrating the data.  But linking an image to data in a cell is pretty tricky, which is why Excel makes charts so nicely.  Charts can be overkill sometimes, while I try to keep things as simple as possible whenever possible.  To which, this is precisely why Microsoft added the Wingdings fonts to Windows.  This formula...

         
=IF(c2>d2,"r",IF(c2<d2,"s","D"))


...evaluates to either an r, s or D, depending on a value being higher, lower or equivalent to another value.  These characters, rendered in the Wingdings 3 font, give an "up", "down" or "equality" pictogram which shows up nicely in the spreadsheet, in line with the data it illustrates.  Here's how it looks (click to enlarge):


Depending on your data set, you can go further: applying conditional formatting to add colour based on some condition, or displaying a temperature gauge for "corporate heat" levels.

There are many hundreds of interesting symbols built-in to the standard font libraries on your computer.  Explore the choices visually by menuing to Insert...Symbol and paging through the lists in the dialog.  Just pick the symbols you think will best exemplify the "at-a-glance" impact of your findings, figure out what their letter-equivalent characters are, and sub those into your nested-if formula.  Your reports will pop, your workbooks will stay light and lean, and accolades are sure to follow.

Tuesday, March 30, 2010

Programming with Class... Modules

A recent discussion on LinkedIn asked: who is using Class Modules, and for what purposes? Answers were either something like "Class what? Never heard of it." or "Couldn't live without 'em."  Rather than post my obscure personal opinions there, I post them here...

To put it very simply: a class is just another type of variable.  Variables, as you know, act as containers, designed to hold a piece of data of a certain type.  So in vba if I type Dim intX as Integer, I'm defining the variable intX and telling the computer to set aside some space for a number I'll be tracking.  Then, when I type intX = 32,765, I'm putting a number in that container.  I can pull it out later and do something with it, as in: Cell(1,1) = intX/8.

A class takes this storage of data to the next level, allowing you to build a complex data structure to hold many different types of data.  A class can (and should) also contain logic pertaining to itself - code that resides in sub-routines, functions, and class properties.

To help visualize the distinction between a simple variable and a class, imagine we're trying to track a list of team members for a payroll application we're building.  Each team member has a variety of information that distinguishes him or her from the other people: Name, DOB, Job Title, etc.  By creating a "Person" class in vba, with methods to handle the storage and retrieval of each of these personal attributes, we can keep each person's details together in memory, and move those details around in fun and also in beguilingly complicated ways.

Object Variables

The class module in Excel's code editor is where you define the attributes of a class - its name, methods, properties, and so on.  Elsewhere in your code you create Instances of the class using an object variable, as in: Dim objTM as clsTeamMember and Set objTM = New clsTeamMember. You can create many, many instances of a class, just like there can be many, many integer variables, all instances of the Integer data type.   

Collecting Objects

Once packed up into an object variable, the class can join others like it in a Collection.  A collection is like a list or stack that can then be searched, summarized, or iterated using the For Each... construct, as in: For Each objTM in colMembers.  Collections are, for me, the best reason I can think of to use class modules in vba.

Collections can be used to generate a unique list (as I demonstrated in an earlier posting), or to fill pick-lists in user dialogs, among other things.  A class can even contain a collection as one of it's properties, allowing you to create nested collections of collections, without limit. [Tip: Always set your object variables to Nothing when done with them to avoids those pesky memory leaks!]

But why would you need this?  That really is the question.  Most of what you can do in Excel's memory can be done in a worksheet's cells.  But pulling your data into memory to manipulate it there is much faster and, I find, more interesting, than copying and pasting onto worksheets.  Knowing when to apply object-oriented programming techniques is largely a matter of gut feeling, which comes with experience.

Personally, when building something like a reporting engine, I often define in-memory collections containing the data I'm trying to summarize.  It's then quite easy to make new reports by writing code against the classes and collections I have set up.  Effectively, I think of classes and collections as a giant universe of infinite worksheets in the sky, to conjure or disperse as needed...  

Try to read up what you can on this topic, and chip away at understanding until you begin see the light.  I recommend diving in and adding a class module to your next vba project.  This is, after all, an essential step in the road from Macro Recorder to Power Programmer.

Sunday, March 21, 2010

Staying Tethered

It is now very easy for the mobile worker to produce billable work while on a ski holiday.  I'm not referring to some sort of client schmooze, or a seminar in Aspen on behalf of some large corporate employer, though these are both highly respected and sought-after ways to get paid by just about anyone who skis.   No, what I mean is that, using the technology now available, a consultant can keep up to date on email, answer calls, work on code and deploy software from virtually anywhere in the world thanks to 3g Cellular data networks and powerful, portable computers.  In my case, a Macbook Pro and an iPhone.  

Hands down, my favourite thing about the iPhone is the Tethering feature.  This allows the phone to act as a wireless hotspot for my laptop.  It's actually quite fast - averaging about 2 Mbps download, which is only slightly slower than my home dsl service.  Where I live (Toronto, Canada) my cell carrier (Rogers) requires that I subscribe to a 1GB data plan in order to use the Tethering functionality.  This runs me about $65 per month.  

It's dead-easy to use.  On my Macbook, I just click on the bluetooth icon, scroll down to "iPhone", and select "Connect to Network".  In about 10 seconds, I'm on-line and doing anything I can do in the office.  I'm using virtualization software to run Microsoft Windows XP on my macintosh. I believe it is possible to obtain the same connectivity using a Windows pc, but I can't speak to the effort of setting it up.

Independent Consultants who are any good provide quick response to phone calls - new business prospects move on if you don't follow-up instantly.  Existing customers expect a certain level of support.  Installed code needs to work.  If your macro's break down during report crunch time, and you can't get to it until next Monday... ugh. it's a major falling down and this can impact other people, cost you business, and destroy your credibility and self-esteem... well, maybe not all that, but it depends on the standards you set for yourself.  

With the right tools we can avoid touchy situations and still do the things we love: An email comes through on the chairlift at 10:30 am, asking for help.  You bomb back to the cabin for a 20 minute bug fix, uploading the 10MB file to your grateful client.  You then fire off a quote for another job that came in, and slide back to the slopes for a few more runs before Lunch.  After the skiing (or wake-boarding, or golfing, or whatever sort of lifestyle you're trying to include in your life) is done for they day, you might put in a few hours more before dinner just to keep on top of ongoing projects.     

In olden times, before mobile office technology, we had to work for companies and book annual vacation time, during which we forgot about work for two weeks and let other people take care of things.  But then for the rest of the year we had to show up every weekday, right on time, and stay there all day long.  Times are changing.  


Monday, March 8, 2010

Navigating the Model


On complex Excel models with multiple, large worksheets, your users often need to "jump" to a specific location. For example, you may have defined a set of financial statements - Balance Sheet, Income Statement, etc. To improve their user's experience, power programmers usually set up some sort of navigation buttons or menus, providing an easy way to find the useful content in the model.

Named ranges are certainly helpful in this situation. You can define a named range which points to the top-left corner of the indicated content. The user can then GOTO that range, either by selecting it in the Name box, or using the GoTo menu command. Unfortunately, Excel doesn't know that we want the named range to the the top-left cell in our view, so it only scrolls enough to get the name "in view", even if that means being on the bottom of our visible range.

To get around this particular annoyance, and to ensure my users have the most pleasant experience possible when using my Excel applications, I developed the GoScroll method. This re-usable routine accepts a named range as it's only parameter. When called, it will activate the appropriate worksheet and jump to the indicated range, making sure to scroll the window such that the range ends up in the upper-left corner of the display. Here's the code:


Public Sub GoScroll(ByVal strRange As String)

Dim intPane As Integer, rng As Range

On Error GoTo Fail

Set rng = Range(strRange)
rng.Parent.Activate

intPane = ActiveWindow.Panes.Count

With ActiveWindow.Panes(intPane)
.SmallScroll Up:=.VisibleRange.Row - rng.Row, _
ToLeft:=.VisibleRange.Column - rng.Column
End With

rng.Select

Fail:
Set rng = Nothing

End Sub


This routine leverages the SmallScroll method of the ActiveWindow.Panes object. If you have split your window, it will have more than one "pane", so this code assumes it will be working with the last pane in the collection. Adjust this as required, depending on how your window splits are set up.

With this routine in place, I can now hook up navigation buttons or menus for any named range in the workbook. I call the code like this: GoScroll("Income_Statement") where "Income_Statement" is the named range pointing the the upper-left cell in my income statement section.

Note that this logic requires that the range must be defined globally for the workbook, not locally on a specific worksheet. I'll leave it as an exercise for the reader to figure out how to handle local names. I hope you find this useful. As always, I welcome your feedback.

Monday, March 1, 2010

The many sides of self-employment

Over the years I have worked on many projects for a variety of employers. Living and working in Toronto, Canada's financial hub, the majority of this work has been for large corporations - which means spending the day in a cubicle. It seems so cliche to diss the cube farm, but diss it I must, because I have forsaken the corporate world and all it's "perks" in favour of the more challenging, lower paying, lonely life of the stay-at-home worker.

The main deciding factor in this transition was my preference for programming with Excel. Many corporate roles entail Excel use, and some even require macro programming skills, but I never managed to find a full-time position where I could create power macros and "genius" spreadsheets all day long, which is something I truly love to do. So I opted for the next best thing: becoming a consultant and taking on jobs for business at large.

Now I get to say exactly what I will work on, where and when it happens, and for whom. I can broaden or constrict my "sphere of acceptability" as necessary, according to how lean things get. I get to work on different projects all the time, and nobody tells me what to wear. The result: I never have to sit at a desk, staring at a clock, praying for the end of the working day. I also get to avoid commuting by subway, dull and insufferable co-workers, sick building syndrome, elevator news, galoshes, food courts, florescent lights, staff meetings, doorway smokers, kiosks, falling glass and noon-time birthday parties.

Sound like a dream? Not so fast... I think I already alluded to lower-pay - when you bill by the hour, you really only get paid for hours worked, unless you're corrupt, in which case you won't keep your customers for long. I once had a friend who worked for a pretty cool software company. He was bragging about his unusually large salary and how he got paid every week, no matter what he did or when he showed up. His most startling claim was that he got paid more in the time it took him to go to the toilet than the food had cost! If all you care about is money, perhaps this is the way to go. But I learned some time ago that "quality of life" is really the goal, and since you spend a big chunk of your life working, "work" needs to be quality time too.

I've been working out of a home office for three years now, and I'm very satisfied with my decision. But I am coping with a few issues, such as having nobody to talk to, and no manager to enforce deadlines or check my output. As a programmer, some of the skills I now possess came from working and interacting with other programmers. As a one-man show, I really miss that peer review, and rely on web forums and email as a pale substitute. (I also catch myself talking to the cat a bit too often, but that may be a symptom of something else.) Another thing I miss: all the free software and hardware you get when you work for someone else's company. And don't forget the added burden of marketing yourself, record-keeping and tax filing. Then there's that start-up period when you're building your business, and you have long stretches of time with no billable hours to show for it. Those can be very discouraging indeed.

There are quite a few challenges with the work-from-home, work for yourself lifestyle - it's risky and sometimes frustrating, but I have no regrets. If you are able to do it, (perhaps with the help of a partner who is also employed,) it is amazing how much simpler and pleasant life can be. Good luck to anyone who decides to make the leap.  Now, if only I could find a way to move the family out to the country...

Wednesday, February 17, 2010

IF only SumIF could do this...

On a recent project I had the challenge of generating a summary for a large, unsorted, filtered data set.  The model was initially created using the SumIF formula to grab subtotals for a range of cells, based on certain criteria.  Unfortunately this wasn't working, because the summary included rows of data that were not part of the filtered results - in other words: data from hidden rows needed to be ignored.

The Excel veterans out there probably already know about the SubTotal function, which contains a parameter that allows you to include or exclude hidden rows.  This works well as a simple SUM (or count, average, max, min, etc) when working on filtered sets, but does not allow you to include any additional conditions.

Another option would be to add control columns to the data set - something off to the side that evaluates the IF condition, and returns a 1 or 0.  This can then be combined with the actual data using a function called SumProduct.  SumProduct allows you to specify two ranges of numbers, and will sum the product of the items in the first range multiplied by the items in the second range.  Rows that failed the condition would multiply the data by zero, effectively removing them from the total.  This approach, though feasible, required adding a large number of new formulas to the data sheet, and was rejected because the data sheet is being populated regularly by copying and pasting from an external source.  We wanted to keep that sheet lean and simple.

In the end we opted for the power-user's approach: writing a custom function to provide the missing logic.  This is the function: SumIF_X


Function SumIF_X(rngCriteria As Range, _
rngCompare As Range, _
rngValues As Range) As Variant

'new version of the SumIf function
'that ignores hidden rows!

Dim intRow As Long
Dim rng As Range
Dim cTemp As Variant
Dim intTop As Long
Dim strTemp As String

On Error Resume Next

intTop = rngCriteria.Row
strTemp = rngCompare
For Each rng In rngCriteria
  If rng = strTemp Then
    If rng.RowHeight > 0 Then
      cTemp = cTemp + rngValues.Cells(rng.Row - intTop + 1)
    End If
  End If
Next

Set rng = Nothing
SumIF_X = cTemp

End Function


It is designed to mimic the SumIF function, so the parameters are the same: A criteria range, a compare value and a data range.  Using the For Each construct, we cycle through all the cells in the criteria range, and for any that match the compare value (and who's rowheight is not zero,) we add the associated data element to our running total.

It works fine, but there's a potential danger in using this approach: if the data set is really large (thousands of rows), recalculation time for the worksheet will be impacted due to the looping nature of the code.  In our application it was not deemed a show-stopper, but for larger workbooks, or when the custom function is being used in thousands of cells, we would probably seek out a leaner approach (or bite the bullet and set up a pivot table!)

Tuesday, February 9, 2010

Wrangle Those Names

Excel names are a fine way to store and reference data in a workbook.  Naming a cell allows you to include a readable reference to that cell in your formulas and macros.   Just imagine opening a spreadsheet from two years ago and seeing the formula "=$AG$377*(1+Sheet3!$BB$2110)"... not too meaningful until you navigate to the cells in question and see what's in there.  If, instead, you saw "=Balance*(1+Inflation)", you would instantly know what the formula does.

Names can do much more than simply point to a cell.  You can have a name refer to a RANGE of cells, and then act upon each cell in the range, as in: "=SUM(March_Sales)."  Names can also be dynamically-defined - something I talked about in an earlier post.  Dynamic names aren't simply linked to a particular cell, but are evaluated when the workbook is calculated.  For example, a dynamic name might refer to a member of a list, who's position is determined by some other cell's value.

Names can (and sometimes do) get ugly - especially when they refer to cells that have been deleted.  In this case, you end up with a name that evaluates to an Error reference instead of a value.  Tracking down these errors can be very difficult, because Excel's name management controls are, unfortunately, a bit rudimentary.  You need to go to the Insert menu, choose Name, and then Define.  Then click on each name in the list to verify what it's reference is.  On massive models that have evolved over time, with thousand of defined names, it can be a multi-day task to clean these up.

Another major maintenance headache is tracking down cell formulas that refer to cells in other workbooks.  Sometimes, when you copy from one workbook to another, you don't get the data you want but rather some indirect reference to it's original source.  This is occasionally desirable, but in my experience it is usually a hassle that confounds users, causes unwanted popup dialogs, and burns programming time.

To better deal with these issues, I created the Name Wrangler.

Name wrangler is a simple macro that lists and cleans up names in my workbooks, and identifies all external references in formulas.  The code resides in my Personal macro workbook, and I can invoke it on the active workbook when it's time to clean it up.

The Name Wrangler inserts a blank worksheet in your workbook and generates two tables: all the defined names found, and all formulas that refer to external files.  Names that evaluate to errors are then flagged and deleted.  I use this all the time, and it has saved me a lot of wasted hours scanning through workbooks, tracking down annoying errors and references I don't want.  The first thing I do when a client sends me a large model is run the Name Wrangler to get a handle on what I'm dealing with.

I have packaged the Name Wrangler in a workbook, which you may download here.  Feel free to use this code any way you like.  I welcome your suggestions and comments.

Wednesday, January 27, 2010

Setting Traps

I believe that Excel is the ultimate expression of software as canvas - infinitely configurable and adaptable to any problem.  When computers were first invented, those scientists could not have predicted how superbly we manipulate the microchip with software such as  Microsoft's Excel.  That said, it's all too easy to send the whole thing off the rails.  Your code must be able to handle those unexpected user interactions, data anomalies and logic bombs you allow in.  I'm talking about VBA run-time errors.

An error condition arises in the VBA compiler when there is an impossible situation that stops execution of your code.  This often occurs when you attempt to put the wrong type of data in a variable, fail to declare a variable, or exceed a variable's inherent limits.  Trying to call a routine that doesn't exist, dividing by zero, and file system problems are also common causes of run-time errors.

Errors can also happen outside the compiler.  In other words, VBA doesn't actually "choke" on anything.  The code just runs and dutifully produces some results that just happen to be completely wrong.  These sorts of errors are much more difficult to detect, and are often the result of programming errors or bugs in your code.  More on these in a future posting.  Today we'll stick to trappable errors.

For example, say you have a routine that reads through a table of numbers and adds each one to a running total.  If you are only expecting numbers, what happens when the next cell contains a word?  Or an Excel error value?  This may have worked fine when you wrote it, but as the users modified the underlying spreadsheets, things changed.  Now the compiler hits a snag and triggers an error condition.

The worst thing to do is have your code fail silently, with no hint that anything wrong happened.  If it's going to crash, please let us know at least that.  A clue about which function took the hit, and how, goes a long way towards diagnosing the cause.  This information can be sent to the debug window, or show up in an alert box that your spreadsheet's operator will see.

Depending on who's running the macro, you might not want to spew out a bunch of technical detail in a dialog - that can be off-putting.  Just say "Sorry... there's a problem loading the file. Try again or Contact support..." or something friendly like that.

To handle errors superbly, they must be trapped using a combination of the On Error statement, along with an error handling section in your code. The Err object is also helpful because it "knows" what happened.

Every complex piece of code I write has, more or less, the following structure - my template for a function or sub-routine:

Sub
On Error Goto Fail ' the trap!
' - code -
Exit Sub
Fail:
' - error handler -
End Sub

Here, the word Fail refers to the code Label, which can be just about anything you like.  The On Error Goto Fail tells the compiler to jump down to the error handler and run whatever code it finds there whenever an error condition arises.


The Err object, at this stage, will contain much information about the error - most importantly the error code and description.  If you're anticipating certain types of errors, you can check the error code and handle these differently.  For example, if you are trying to read a file from disc, a specific error code might indicate that the disc is not available, or the file is not found, or the path is not found, etc.  Each code could be tested for, and custom error messages generated to help the user pinpoint exactly what went wrong and how to fix it.


Here's a routine to spit out the full set of VBA error codes.  They go up to 65,535, so be prepared for a long wait if you run this!  It appears that most codes are either un-used, or the dreaded and meaningless "Object-Defined" type.

Sub ErrorCodeDump()
 Dim idx As Long

 On Error GoTo Fail
 For idx = 1 To 65535 'watch out!
    ActiveSheet.Cells(idx, 1) = idx
    Err.Raise idx
 Next
 Exit Sub
Fail:
 ActiveSheet.Cells(idx, 2) = Err.Description
 Resume Next
End Sub

For lower-power applications, I usually just print out the Err.Description, along with the name of the function or procedure, to the debug window.  Then I can run the code, observe the debug window, and see what happened.  It all depends on what you're deploying and how it will be supported.

 Note that Error Trapping is controlled at the Excel level.  Check the setting in the options dialog - from the VBA Editor, Tools -> Options, General Tab: Select Break on All Errors to have Excel "barf" every time an error condition is raised (during early development), or, for a more controlled user experience, choose Break on Unhandled Errors.

An application that handles errors elegantly and professionally will go a long way towards winning your customer's undying loyalty, admiration and respect.  Next time: more error trapping techniques such as raising your own errors from the seed, intentionally causing errors, and other erroneous logic.