I am currently working on a data warehouse project for Lush Fresh Handmade Cosmetics. They are based in Poole, Dorset in the United Kingdom and have around 900 stores worldwide.
However the underlying data warehouse architecture is not fit for expanding the reporting and analytics requirements that Lush have, hence the project.
We are designing using classic Kimball dimensional modelling principles and in addition to the obvious Date dimension, the ability to measure sales by time chunks is very desirable. In fact the ability to examine store performance in hourly slices is a major feature of the QlikView BI solution currently deployed. Therefore we also need a Time dimension.
One of the compelling things about dimensional modelling and the resultant entities, is the relevant models work across industry sectors. So if you have worked with an insurance star schema before, the model in other insurance companies will have plenty in common.
Sure, you may need a few extra attributes particular to your business, but mostly, after a bit of renaming you are usually good to go for a basic target database. Of course the 80/20 rule kicks in and that 20% customisation will keep you busy.
The book “The Data Warehouse Toolkit : The Definitive Guide to Dimensional Modelling” by Ralph Kimball & Margy Ross, contains quite detailed versions of most of the schemas, explored by industry sector.
I think this simplicity of design speaks volumes for the initial thinking behind the fact and dimension concept. It was originally conceived in the 70s at the Dartmour College, New Hampshire and later developed and exploited so well by Kimball at Red Brick Systems in the mid to late 80s. Red Brick are credited with achieving 10 times the performance of other database vendors via the use of very clever indexing.
It also means that investing in Kimball books gives you access to thorough data models for most of the business sectors you will encounter. Don’t get me wrong; if you don’t understand the models and the method behind them, you stand little chance of loading them correctly or efficiently. However the books do provide excellent best practice tips for getting you started or refining your existing experiences and skills. I can also highly recommend the training courses run by the Kimball Group that are run regularly at various locations around the planet.
So for Lush, we needed to create and populate a DIM_Time table. After defining the table schema, I hunted around on the net to see if anyone had published a script to load it. I found there is surprisingly few examples out there. There are a plenty of schemas and scripts for getting you started with a basic DIM_Date table but not so for a “time” dimension.
Interestingly, whilst Kimball & Ross discuss the function and usage of the time dimension in the latest revision (third edition) of “The Data Warehouse Toolkit”, they don’t describe the schema in any detail.
The Date dimension is common across most star schema databases (most businesses will be interested in ‘when’), so it’s not surprising that examples for it are in good supply. It is great that the BI / ETL community has shared this and many other script examples as it’s such a basic feature of a data warehouse solution and why re-invent the wheel.
Sharing is a growing economic model, especially within the information / programming industries. Recently Jamie Thompson shared his T-SQL code script library on GitHub & his SSIS Reporting Pack on Codeplex and of course Andy Leonards SSIS framework has been available for sometime now, helping SSIS shops wrangle execution and logging of ETL packages. Of course there are plenty of other excellent and prolific bloggers out there and numerous user groups. I have just mentioned the two that I have found most helpful since I have been working with SQL Server.
At Lush we have benefited from the shared efforts of others so we thought it would be helpful to publish our DIM_Time schema and it’s loader.
We are still building the ETL subsystems so this entity hasn’t seen any live usage yet. Therefore it’s subject to change (I will update it should we revise it) and I am not a rockstar programmer, so I tend to go for ‘readable’, simple code (well, in my mind anyway). If you see ways to improve it then please don’t hesitate to let me know.
Whilst I was creating the downloads, I thought it would also make sense to include our DIM_Date schema and accompanying loader script. I downloaded the original script from the internet about a year ago and unfortunately I can’t locate the original author / link to credit them (if you recognise it as yours please let me know and I will update the script with your details if you wish).
Thanks again to the team at @lushdigital for allowing me to publish this work. Jack and his team have also been very busy and have recently launched the new Lush website “focused on content and ‘deli-style cosmetics’ that aims to communicate the story behind its products and ethical values”. Click here for the full review in Marketing Week.
The dacpac for importing into SSDT is the best method for reviewing, modifying and deploying these SQL Server database entities. Simply download it, unzip it from the archive (if it doesn’t happen automatically) and import it into a new SSDT project. The project was set for a target of SQL Server 2012 although I dont believe any 2012 specific elements are used.
Alternatively, simply download the T-SQL script and apply it to the SQL Server database of your choice (2008 and above due to the use of DATETIME2 data type).
A more complete set of instructions / explanations are available via the “documentation download” link.
I hope it helps and frees up a few hours for you to add value somewhere else in your BI / ETL solution.
As I said earlier, please let me know if there are any bugs or suggestions to improve the code.