- air
- ajax
- algorithm
- apple
- bitbucket
- braintapper_exchange
- charts
- chumby
- codeigniter
- cognos
- complexity
- crosstab
- dash
- dashboard
- date
- dbvisualizer
- decisions
- dimension
- dogfood
- dona_wong
- edward_tufte
- feature_checklists
- feature_excellence
- filemaker
- firefox
- firewall
- flot
- flowing_data
- fogbugz
- football
- free
- freenas
- freshbooks
- gm
- google_charts
- iPad
- javascript
- jdbc
- jedox
- mac
- macbook
- maps
- marsedit
- mercurial
- metaweblog
- metrics
- microstrategy
- monowall
- moo
- nathan_yau
- open_source
- palo
- pentaho
- pfsense
- printing
- programmers_interfaces
- rapidweaver
- regex
- regexr
- rest
- smoothwall
- sony
- sqlpower
- stackoverflow
- statistics
- stephen_few
- svg
- tablet
- ticket_agent
- tip
- tm1
- transformer
- trick
- typographic grid
- usability
- visualization
- w3c
- web
- wiki
- wikkawiki
- work_management
- wsj
Overview
A good date dimension is essential for any data warehouse or reporting database. If you don't have the right tools, you'll often find yourself generating this date dimension in Excel, which can be tedious.
I've created a reusable Pentaho Data Integration transformation that lets you create a date dimension with a user-defined start date and number of days.
Feel free to tweak to your heart's content. The download link is at the bottom of the blog entry.
Caveats
- The first date of this date dimension starts in October 2009, which coincidentally (grin) matches up with the day Braintapper came into existence. You can change this date as indicated in the transformation instructions.
- This transformation is for educational purposes only. It's up to you to make sure it's debugged if you're thinking of putting it into a production system.
- This transformation is based on a standard calendar year. It doesn't factor anything like lunar dates, 4-5-4 calendars or fiscal years that don't start on January 1.
How it Works
A quick view of the transformation is here:
There are 5 basic steps used in this transformation.
Using a "Generate Rows" step, I set the first day of the dimension to iterate an arbitrary number of times. In the sample, I create 10,000 lines, which means there are 10,000 days in the dimension.
Next, I use an "Add Sequence" step to create a counter for each row.
- I use a "Formula" step to create each date in the dimension, using the counter from the previous step.
- I create the rest of the date dimension elements using another "Formula" step.
- Finally, I output the dimension table to an Excel file. You can change this to any other output type that you would like.
Date Dimension
In the date dimension, I like to create "In Dimension Counters". These are basically running counts of the Year, Quarter, Month and Day in the dimension. I also like to include Month and Weekday names and abbreviations.
I like to include these added elements because some database/reporting engines do not have fully fleshed date functions, so these added elements allow you to use plain old SQL to do conversions and calculations.
Elements
- DimensionDate - The date, as a date type.
- DateSid - The date, as an integer representation of YYYYMMDD.
- Year - The year, as an integer.
- Quarter - The quarter, as an integer.
- Month - The month in the year, as an integer.
- Day - The day in the month, as an integer.
- DayInYear - The day in the year, as an integer between 1 and 366.
- Weekday - The day of the week, starting on Sunday indexed at 1.
- MonthName - The name of the month, in English.
- MonthAbbreviation - Three letter abbreviation of the month, in English.
- DayName - The weekday name, in English.
- DayAbbreviation - Three letter abbreviation of the weekday, in English.
- YearInDimension - The year counter for the entire dimension, starting at 1.
- QuartersInDimension - The quarter counter for the entire dimension, starting at 1.
- MonthInDimension - The month counter for the entire dimension, starting at 1.
- DayInDimension - The day counter for the entire dimension, starting at 1.
Notable Formulas and Gotchas
Calculating Month/Weekday Names and Abbreviations
Since the Pentaho Formula step doesn't have month or weekday name date functions, I used a variant of the tip that I blogged about last November to generate these.
DateDifs and In Dimension Counters
Pentaho's DateDif function in the "Formula" step appeared to be a little buggy, so many of the "In Dimension" element calculations are a little more complicated than they needed to be.
Because my first year start date is not January 1 (which appears to be the cause of the bug), I had to create an exception for the first year counters. For example, the month in dimension calculation is as follows.
if ( // calculations for the first year year([FirstDate])=year([DimensionDate]); datedif([FirstDate];[DimensionDate];"m")+1; datedif([FirstDate];date(year([FirstDate]);12;31);"m")+1) + // calculated from second year forward if(year([FirstDate])=year([DimensionDate]);0; datedif(date(year([FirstDate])+1;1;1) ; [DimensionDate]; "m")+1 )
If the DateDif worked properly (I was getting something like 21 months between 2009/10 and 2010/1 - what???) then I'd be able to drop the first part of the formula above.
In any case, the formula now only suffers from a little redundancy, but it can accommodate any date dimension that doesn't start on New Year's day.


