ETL origins

I landed my first programming job at Mitsubishi Motors in Australia, in 1988. To my very good fortune they were transitioning from VSAM to DB2 and SQL/COBOL. As a trainee programmer I showed an early proficiency for SQL and after one year of programming I ended up in the newly formed Data Services Group as a data analyst.

In those days a data analyst was quite different from the generic ‘reporting’ connotation quite frequently and erroneously attributed to that role. The senior DA, John Miliado, could talk (read data analyse) the red button off an IBM mainframe. His technique was dubbed Sicillian bocca guerra (mouth karate) by our legendary data guru, Mike Roberts.

This is where I first heard about “the universe of discourse”, Codd & Date, 3NF and the 1st, 2nd & 3rd waves of computing. I will never forget the really droll CICS training video delivered by IBM computer techs in white coats.

My first dataset was handed to me by Mike. He had pulled the full BOM into a dataset via Easytrieve. A few analysts in the Parts & Accessories admin department would spend days compiling reports (I think this might have been in some Lotus package as Excel hadn’t been born yet), for some associated demand planning task.

Within a few days, I had a table loaded, with a cyclic refresh, and a QMF driven interface, enabling the analysts to derive parts breakdowns at any entry point of the BOM. Days of repetitive effort reduced to on demand 5 minute queries. This was my first taste of ETL, but that term, similar to Excel, wasn’t yet about. Nor was business intelligence.

Over the next 12 months we set about delivering presentations to user departments, compelling senior managers to embrace our new IT services and began creating simple, easy to use QMF interfaces, liberating data from the core databases.

That was 1990 and I am very grateful to my mentor Mike and the boss of the DSG, Mark Ward. Thanks to those forward thinking guys I had tasted the tremendous power and benefits of the data centric IT world and it set me on a path that has ultimately led me to SSIS.

There has been plenty of data flow under the bridge since that first step, but reassuringly, the basic premise remains the same. Unlocking data from the core databases that capture the activities of any business and transforming it into useful information.

Whether it is making data available for insight or piping it around the various platforms, systems, databases and applications that constitute a 21st century enterprise, the big difference is the marvelous new tools at our disposal.

ETL control flows replace job control languages (JCL, TCL), data flow transforms replace 3GL programming languages (COBOL). SSIS and it’s competitiors provide a superior platform for the ETL tasks we now face in business, enabling huge productivity leaps.

However, this doesn’t mean things are any easier than back in the 90’s. Remember when you only needed one bank account. The newer generation clearly won’t, but the point is there is a proportionate increase in the amount of data moving about business systems and these advanced ETL tools are essential to get the job done.

SSIS is now an enterprise class ETL tool, but in itself it does not deliver enterprise class solutions. Requirements and well thought out design are still key to a robust ETL suite that delivers every cycle. Programmer issues and challenges, such as recruitment, retention, training and contingency plus daily issues such as syntax hassles, deadlines, standardisation, reusable modules and repeatable test process remain paramount and unchanged.

So as visually tempting as SSIS is, in the same way Photoshop may appear to be a starting point for creating great works of art, always start with a whiteboard or pencil & paper, think deeply and understand where you are heading before getting your toolbox out.

 

Tags: , , , , ,

One Comment

Leave a comment
  1. Jamie Thomson 2012-04-11 at 11:03 #

    I’m looking forward to hearing more of this Eric. Subscribed.

Leave a Reply

You must be logged in to post a comment.