November 18, 2009
So let's say you've been given a csv or spreadsheet that is in a crosstab format.

Take the example below. From columns E forward, we have dates as column headers. Each of these dates represents a date dimension data element.



If you're trying to turn this spreadsheet into a fact table, you've got to load the date columns in as data.

Well, the easiest way to do this is to use Pentaho Data Integration, my favorite data conversion tool.

Pentaho has a transformation step called the Row Normaliser that can handle it for you.

So if you look at the flow below, you can see that I am loading the CSV, sending it through the Row Normaliser, converting some date abbreviations to a number and then outputting it to a table.



I won't go through the entire solution, but the key parts. My solution is loading the data shown at the top of this page, and this is what the Row Normaliser step looks like:



Click on Get Fields and get rid of all the unnecessary fields. "Type field" is a bit of a misnomer, but that's the header field you want to convert to data. In our case, it's a date, so call type field Date.

The Fieldname shows the header as it comes in. You can actually change the value to something else by changing the corresponding value in the Type column. For example, if you want "01-Jan-08" to become "January 2008", just change the appropriate cell.

The "new field" is the name of the field that you want it to output to. Since all the data I'm dealing with in my example are sales dollars, I've assigned all the values to the Sales field.

That's basically it.

Now you just need to map it to your table output fields as shown below:



And you're done!
keyword pentaho  keyword crosstab