Wednesday, April 20, 2011

Database integration using object classes

This month I am working on a quantitative analysis tool for a stock broker.  He wants to track stock data over time and rank stocks in a variety of ways.  Rather than try to figure out all the different reports and charts we will need to build, we agreed that the tool needs to be completely open-ended.  This means the customer can configure every aspect of the application, including the data fields being captured, the format of the reports, and the formulas used for ranking and classifying the data.

Capturing daily numbers for 5000+ companies requires a robust data storage and query component, so an Access database has been created.  This de-cupling of the data from the program logic is essential for an application of this scale.  Since an Excel project is never finished, the steady stream of updates and fixes can be deployed simply by emailing an updated Excel workbook.  The macro then connects to the Access file on the client's system, loads all the settings and information the client has been inputting and collecting, and configures itself.

Logic was created to handle custom field definitions and custom formulas.  A report designer was also built allowing the operator to layout new reports, control grouping and sorting of data, sub-totals, and formatting.  All this setup data is stored in the Access database.

When the user wants to view a report, the macro loads the setup details from the database, queries the company data tables for the required data, and draws up the report on a worksheet.  To handle all these inter-related data entities, we need to define a set of object classes in VBA.

I created classes for Field, Report, Company, Sector, Industry, Price, and all the other entities that our database is tracking.  At the project level I define Collections which are loaded with the individual instances of each class when the workbook is opened.  So when the user asks for a report, the code iterates through the Reports collection and builds the list of available reports to choose from.

The classes contain the logic they need to interact with the database and with each other.  For example, each class has Load, Insert, Update and Delete methods.  When the user defines a new custom Field, the code creates an instance of the Field object, updates it's properties with the settings the user has input, and calls it's Insert method to write the actual record to the database and insert itself into the Fields collection.

When the reports get generated, the Report object "knows" how to load a sub-collection of the fields that are on the report.  Each Field knows how to fetch the appropriate piece of data and format it for presentation.  If a Field is calculated based on other pre-defined fields, the logic will parse this out and use new Field objects to fetch the individual elements that make up the calculation.

By carefully modelling your objects this way, your code becomes incredibly powerful, because the "work" is distributed across the object model.  Your core routines can be quite lean and efficient, and much easier to read and maintain.  Instead of 20 different Reporting macros, you have one generic macro that produces all possible reports.  And since you're manipulating massive data structures completely in RAM, your code can run very quickly compared to the basic approach of stacking tables of data on different spreadsheets and iterating through cells row by row.

Learning how to model your data using object classes is not an easy skill to master.  I have many years of programming experience on a variety of platforms, including VB.Net, to call upon.  I have also made it a priority to master the design and programming relational databases using SQL.  

Yet gaining this knowledge was achieved by doing.  When you start experimenting with advanced techniques such as object classes and external databases, you build confidence, overcome challenges, and ultimately learn how to produce enterprise-level applications that are extendible, powerful, portable and maintainable.

For more information on these topics, please see some of my prior postings: