Tag: pentaho
February 10, 2010

Introduction

So here's an edge case for you. How do you use JDBC to connect to an instance of Filemaker Pro or Filemaker Server?

Note: These instructions are for Mac users, not Windows users.

First, you're probably wondering who on earth would want to do this? While Filemaker does exist in the Windows world, Filemaker is the de facto equivalent of Microsoft Access for Mac users. It's pervasive, and it's quite good. By enabling JDBC connectivity to Filemaker, you can now use some more powerful tools, such as Pentaho Data Integration to pull in data from all sorts of data sources. Now I wouldn't recommend using Filemaker as a data warehouse, but you can definitely use Pentaho to pull in data from any REST API enabled service you might subscribe to, such as Freshbooks, Basecamp, etc.

Sidebar: For me, the concept of a JDBC driver for Filemaker is quite bizarre, since I haven't really used Filemaker since the early nineties, before it became a quasi-relational database. Filemaker is a nice little database. For some things, like creating a data-driven front end, it's significantly better than Access. For those who yearn for the simpler days of Filemaker, however, Filemaker does offer a nice little database app called Bento.

Before trying to connect to your Filemaker database(s), make sure you turn on sharing from within Filemaker.

Installing the Driver

When you get Filemaker, you should find a folder called xDBC on the installation media (whether it's a CD or DMG disk image). Within that folder is an installation folder for the JDBC drivers. In that folder is a file called

sljcInstaller.jar.

screenshot

Just run it by double clicking it. Jump through the wizard until you get to this screen.

screenshot

Click on the "..." button and navigate to a folder where you want to install the drivers. The installer will work its magic.

Within the destination folder, you'll find a folder called "drivers". Within that folder, you'll find another subfolder called "lib". In this folder, you'll find the driver you need: sljc.jar.

screenshot

OK, so now you have the driver, what can you do with it? I've provided samples for DBVisualizer and Pentaho Data Integration.

DBVisualizer

  1. Launch DBVisualizer.
  2. Go to Tools > Driver Manager.
  3. Create a new driver and call it "Filemaker Pro".
  4. In the URL format, enter: "jdbc:sequelink://localhost:2399" (without the quotes). That's the default URL for Filemaker.
  5. In the Driver File Paths section, click "User Specified".
  6. Click on the open icon (the folder).
  7. Browse to the sljc.jar file you just installed. This should populate the drop down list for the Driver Class field in the Driver Setting section above.
  8. Select "com.ddtek.jdbc.sequelink.SequeLinkDriver".
  9. Close out the dialog. You're done. You should be able to create a connection without the wizard. Make sure your Database Type is Generic. The default User Id is "Admin" with a blank password.

Pentaho Data Integration (Kettle)

  1. Open the Kettle application package by right clicking the icon and selecting "Show Package Contents". Navigate to Contents.
  2. In a text editor, edit the file info.plist. Add an entry as shown below:

    $JAVAROOT/libext/JDBC/sljc.jar 

    screenshot

  3. Next, copy the sljc.jar file from your installation folder into Contents/Java/libext/JDBC/

  4. Launch Kettle. You can create a connection using the parameters shown below.

    screenshot

keyword pentaho  keyword jdbc  keyword filemaker  keyword dbvisualizer  
January 19, 2010

After checking out WABIT, I decided to give a quick look at SQLPower.ca's other free offerings: Power*Architect and DQGuru. Note that SQLPower's Dashboard and Power Loader (ETL) tools do not appear to be free, open source software. I won't be offering any opinions on the non-free products. In any case, I don't really have any preference when it comes to Dashboard apps (right now at least), and when it comes to ETL tools, I'm still quite partial to Pentaho Data Integration.

Power*Architect

Power*Architect is a data modeling tool in the vein of ERStudio and ERWin. Many independent consultants (like me) will probably find it hard to justify spending scads of cash for a tool like ERStudio for their data modeling activities. Myself, I use MySQL Workbench, which is quite good. The only limitation is that... it only works with MySQL.

Here's where SQLPower's Power*Architect comes in. It's free (unless you need support), and it's multiplatform. If you haven't figured it out yet, I'm a Mac-centric user. While I do use virtual machines for Windows-based work, it's often nice to be able to run apps natively in the Mac desktop. Don't get me wrong, Power*Architect doesn't have the look and feel of a true Mac app. It feels more like a Windows app in a SWING container, but that works fine with me. This isn't an app designed for Joe Sixpack, so it doesn't need to have the eye candy, no matter how nice that would be.

My perspective on this app is coming from a ERStudio and MySQL Workbench user, and to be honest, I didn't read the manual. After about 10 minutes of use, I pretty much figured out how to use the interface. It's not that hard to figure out - a very good thing. I haven't delved into those detail level features, but on the surface, it appears that Power*Architect can easily satisfy that 80/20 proportion of must-have versus nice-to-have functionality.

Bottom line - am I going to use it? I'm definitely going to give it a whirl for a few months. I'll post a follow up summary at a later date, but in the meantime, I suggest that anyone on a tight budget who needs a data modeling tool check it out.

DQGuru

DQGuru is an interesting beast to me. SQLPower advertises as a data cleansing tool. Specifically, DQGuru de-duplicates, cleanses and performs address correction.

Most of my experience with data cleansing came at the ETL level, and I haven't really had the chance to play with a dedicated tool like DQGuru. Messing around with it (yeah, I should really read manuals), it is best described as a data cleanser for business users. You construct your projects in a wizard like fashion, and SQLPower keeps the user interface fairly simple to insulate them from what's going on in the back end. I would recommend this for database savvy analysts, but the hard core control freaks (like me) would probably prefer the control you get from the likes of a Pentaho Data Integration or IBM Information Server.

Like Power*Architect, I figured out how DQGuru works in a matter of minutes.

I don't see myself using DQGuru very often, but I can definitely see value when you have a quick and dirty cleansing job to process. DQGuru is a good tool for advanced end users who don't want or need to be messing around with complicated SQL (or regular expressions, for that matter) to perform common data cleansing tasks.

Good Times for BI

I really have to say, these are good times for the business intelligence market. The "free" offerings out there are getting better, especially since many open source projects are backed by real companies. Companies like SQLPower, Pentaho and Jedox are really going to give the big vendors a run for the money, especially in the small to medium enterprise space.

keyword sqlpower  keyword pentaho  keyword open_source  
December 9, 2009

For some reason, Regular Expressions have been very difficult for me... much like rocket science.

They are, however, crazy powerful.

Something I've been trying to implement is pulling in time entries associated to a Fogbugz case and pushing them into the case's elapsed hours. Fogbugz time entry system is not idea for a services company, and most companies have their own time reporting system.

The methodology that I'm using is that in my Freshbooks time sheet, I'm using a text convention like this:

[123] Note goes here

Where the case number is in the square brackets. If the square brackets are left out, then there is no case associated with the time entry. Example below:

Note goes here

What I want to do is to use Pentaho to parse the note, and create a case number and note for me to insert into my data warehouse. I can then create a transformation that will, using Fogbugz's API, push the total hours for each case back up to Fogbugz, so that I can use them for Evidence Based Scheduling.

So here's the regular expression I used to extract the data:

(\[(.*)\])(.*)

So what this gives me in Pentaho are three data fields (using my first example above):

Field 1: [123]

Field 2: 123

Field 3: Note goes here

Not bad. But for the second example, without a case reference, I get:

Field 1: Null

Field 2: Null

Field 3: Null

I can easily fix this with a formula step in my transformation, but I know it can be handled in Regex. The only problem is that the knowledge is above my skill level.

Enter Stack Overflow. I basically summarized my scenario above, and in less than 5 minutes, I had a working answer.

(\[(.*)\])?\s*(.*)

Not very different than my original string, but it gives me:

Field 1: Null

Field 2: Null

Field 3: Note goes here

Now I can map Fields 2 and 3 to my time entry table using only one Pentaho step. Nice!

One more thing - if you're not using StackOverflow to solve your technical problems, isn't it about time you started?

keyword stackoverflow  keyword regex  keyword pentaho  keyword freshbooks  keyword fogbugz  
December 7, 2009
Beyond the obvious size and resources, I don't consider my one-man shop much different from any other company.

My business collects its own performance data, and if I don't keep track of it, I'm a little lost.

So while a full data warehouse is probably overkill for me, a mini-warehouse is quite useful.

The last time I was an independent contractor, I ran my business on Excel. Accounting, invoicing, you name it. In the past seven years, the world has changed a little. There are plenty of free, "in the cloud" applications that a small business like mine can take advantage of. And, of course, if I grow large enough, I can buy more capacity.

So let me give you an overview of how I run my business so that you can get an idea of what my data needs will be.

Accounting


I use Freshbooks to do my billing as well as my time and expense tracking. It's a nice little app, grown locally in Toronto. When I was with my last job, I used Harvest, which is also an excellent app. Because we had a billing solution in place, Harvest was a more appropriate solution. Freshbooks will even send out snail-mail invoices for a nominal fee.

KPIs

  • Billings (Dollars and Hours)
  • Utilization
  • Expenses Incurred
  • Age of Accounts Receivables


Reporting

  • Income Statement
  • Balance Sheet
  • Cash Flow Statement

Work Management


Fogbugz is the cornerstone of my operations. I use it for tracking knowledge, tasks and all sorts of useful things. At the atomic level, every activity at Braintapper is rooted in a Fogbugz case.

KPIs

  • Number of open cases
  • Average age of open cases
  • Number of recently closed cases
  • Time spent on cases
  • Estimation accuracy


CRM


I use Highrise for CRM. I only need to track very simple things like new leads and activities. Most actual deals and tasks still get managed in Fogbugz. I don't really think I'll ever need to upgrade this account, since I don't really need the additional functionality.

KPIs

  • Number of new leads
  • Close percentage
  • Pipeline size


Reporting

  • Pipeline
  • "Tickler" list


Putting it all together


So, as you can see, even for a one-man shop, I have a list of analysis requirements that is pretty typical for a consulting company of any size. And, these are just my initial requirements. I haven't even talked about joining any of my data sources together yet.

ETL


So the solution to do my data extraction is relatively easy. It's going to be a hybrid of shell scripts and Pentaho Data Integration. All of my solutions above provide a REST API, which makes for easy data extraction using curl statements.

Some of these statements are easily handled in Pentaho, with the exception of Freshbooks. Freshbooks requires a POST with an XML payload, and the "Execute a process" step seems to have trouble with spaces in the payload. No worries, a 3 minute shell script can extract all my Freshbooks data into XML to be handled by Pentaho.

In terms of databases, I basically stage the data into a staging database, and subsequently clean and transform the data into my mini-warehouse. Easy stuff.

Reporting


So, what is my Reporting solution going to be? Being a Cognos developer, you'd think Cognos would be my solution, right? Wrong. Cognos is not cheap, and there's no "free" version comparable to what Microstrategy is offering. Even if I did have cash to spend, the fact that Cognos only supports IE for the advanced studios is a dealbreaker.

Speaking of Microstrategy, that one's off the list too. Nothing against Microstrategy, but it requires a Windows server, and I'd rather have a solution that runs on any platform.

Pentaho does have a reporting solution as well, but to get any decent documentation, you have to buy a server maintenance license. No thanks. After I did my Dashboard exercise, I realized that it took me just as long to write a dashboard from scratch (and that wasn't much time at all) as it would for me to fine tune the exact same report in Report Studio. What?

Report Studio is one of those 80/20 tools. You can get your data and raw layout done very quickly (80% of the task in 20% of the total time), but the remaining 20% (and 80% of the total time) to get "pixel perfect" (I'm using this term very liberally) is time consuming. The easy response is to deliver a substandard report that has all the data there - I know plenty of people who will call it a day at that point. Myself, I would rather deliver the maximum amount of quality in a reasonable amount of time.

If I have to deal with a clunky interface, I'd rather write my own code, thank you. So what I'll be doing is basically a simple web app using the open source CodeIgniter framework, jQuery, and the Google Charts API. The coolest thing is that I'll probably be able to get my reports done quite quickly, with very granular control over the appearance in a short period of time.

Development


So as I develop my solution, I'll be posting some of the tips, tricks and hurdles that I encounter. I've already built out my ETL, which took less than a day. While I might complain about the one Transformation step that didn't allow me to do my entire ETL in Pentaho, it offers more transformations out of the box than some very expensive, name-brand ETL solutions... for Free (as in beer) no less.

In the coming weeks, I'll try to post information with respect to my dashboard design process as well. Keep in mind that design is not the same as esthetics. In business intelligence, the design of the dashboard is absolutely critical.
A great dashboard always has the right information, concisely summarized, and intuitively placed.

This is much easier stated than done.
keyword rest  keyword pentaho  keyword microstrategy  keyword dogfood  keyword cognos  
November 18, 2009
So let's say you've been given a csv or spreadsheet that is in a crosstab format.

Take the example below. From columns E forward, we have dates as column headers. Each of these dates represents a date dimension data element.



If you're trying to turn this spreadsheet into a fact table, you've got to load the date columns in as data.

Well, the easiest way to do this is to use Pentaho Data Integration, my favorite data conversion tool.

Pentaho has a transformation step called the Row Normaliser that can handle it for you.

So if you look at the flow below, you can see that I am loading the CSV, sending it through the Row Normaliser, converting some date abbreviations to a number and then outputting it to a table.



I won't go through the entire solution, but the key parts. My solution is loading the data shown at the top of this page, and this is what the Row Normaliser step looks like:



Click on Get Fields and get rid of all the unnecessary fields. "Type field" is a bit of a misnomer, but that's the header field you want to convert to data. In our case, it's a date, so call type field Date.

The Fieldname shows the header as it comes in. You can actually change the value to something else by changing the corresponding value in the Type column. For example, if you want "01-Jan-08" to become "January 2008", just change the appropriate cell.

The "new field" is the name of the field that you want it to output to. Since all the data I'm dealing with in my example are sales dollars, I've assigned all the values to the Sales field.

That's basically it.

Now you just need to map it to your table output fields as shown below:



And you're done!
keyword pentaho  keyword crosstab  
November 13, 2009

Nathan Yau at Flowing Data (Add it to your RSS feeds now. Yes, it's that good.) had a great entry this past week on how to roll your own map infographic using a Python script on an SVG.

That got me thinking, how would that be done in Pentaho Data Integration?

Not wanting to spend too much time on it, I threw together a quick proof-of-concept, and I thought I'd share it with you.

Before we start:

  • This is a tutorial for creating the map, not on using Pentaho. I will assume that you know how to create jobs, transformations, steps, etc.
  • Read the Flowing Data article carefully before you read this. While most of this tutorial is self-explanatory, I'm not going into as much detail as Nathan did in the original. I will try to follow Nathan's structure to make things a little easier to digest.
  • This solution was done as a "quick and dirty" solution. There's more than one way to skin a cat, so feel free to tweak and optimize as you would like.

Step 0. System requirements

  • A recent version of Java installed on your computer. Chances are, you've already got it.
  • The latest version of Pentaho Data Integration.
  • A copy of Inkscape, a freeware SVG editor. Snow Leopard users be warned, Inkscape won't work without a lot of tweaking, you might want to open it up on a Leopard machine or a Windows VM.
  • A text editor would be handy, but Notepad or Textedit will do.

Step 1. Clean the Map

As Nathan mentioned, the map he used is an SVG, which is really an XML format. The Python library he used, Beautiful Soup, is very forgiving on XML files. Pentaho, not so much.

Inkscape native SVGs do not make for Pentaho friendly files. Fortunately, Inkscape can save some files that will almost work in Pentaho.

As in the Flowing Data article, we're only concerned about the County IDs (the FIPS number) and the background colour of each county. So first download the counties map.

Open it up in Inkscape. Go to Save as, and save it as a "Plain SVG" with the filename "USA_Counties_with_FIPS_and_names plain.svg" (you can name it whatever you want, but my Pentaho solution used this name). This is key to cleaning out the extraneous Inkscape crud in the SVG file that would otherwise give us problems in Pentaho.

Feeling lazy? Here's [the plain SVG][8] I made.

Step 2. Get the Data

Nathan kindly created a CSV file here, so let's reuse it. Remember that Pentaho can handle just about any data source, which is why this whole mapping exercise is so worthwhile. If you've got similar data by county, you can just substitute the input in the solution for your data source, and make dynamic maps accordingly.

Step 3. Take a quick Look at the Plain SVG

In a text editor (Notepad++ on Windows, or TextEdit on Mac), open USA_Counties_with_FIPS_and_names plain.svg. It'll look something like this:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?!-- Created with Inkscape (http://www.inkscape.org/) -->
&lt;?svg
xmlns:svg="http://www.w3.org/2000/svg";
xmlns="http://www.w3.org/2000/svg";
<version="1.0"
<width="555.22198"
<height="351.66797"
<id="svg9559">
<defs
<id="defs9561" />
<path .../>

There are still a couple of troublesome things with the file above that will cause some hiccups with Pentaho, but we'll resolve them right in Pentaho with a preprocessing transformation.

The namespace attributes (xmlns, in red above) are going to cause Pentaho to choke, but we can fix this quite easily. We'll fix this with a preprocessing step, outlined in Step 5.

Step 4: Launch Pentaho

A gentle reminder guys (and gals): I'm not going to get into any detail on how to use Pentaho. I'm assuming that you'll be able to figure out how to create the steps, hops and linkages to make this work.

For the lazy among you, I've included a ZIP of the XML files you can import into Pentaho so you don't have to recreate all of this from scratch.

Important: Make sure you change the paths so that they work locally on your machine. I did this on a Mac, so my paths will definitely not work if you're on Windoze.

Step 5: Preprocessing Transformation

Even with the "plain svg" map, we still need to fix the file a little. Instead of doing this manually, I've created a preprocessing transformation. The thinking behind this is to allow you to tweak the solution for your own uses so that it's reusable.

To do this, I created a new Transformation called "00 Map Preprocessing":

In this Transformation, we need to perform three tasks:

  1. Load the Plain SVG file
  2. Parse out a couple of offending attributes
  3. Rewrite the SVG file as a temp file

Load the Plain SVG file

  1. Create a Text file input step and point it to the Plain SVG file you created earlier. Make sure you click on Add.

  2. In the content tab, there are a couple of things you've got to do. Leave it as a CSV, but make sure you change the delimiter to TAB (not visible in the screen grab) and set the format to Unix. If you do it as DOS, you're more likely to end up with unwanted errors.

  3. Finally, in the Fields tab, just create a single field called Line and make sure it's a String type with an undefined length.

Parse out the offending attributes

  1. The next step is to create a Formula step (from the Scripting folder). Make sure you put a hop in from the first step.Create an entry called NewLine that is a string with the following formula:
SUBSTITUTE(
  SUBSTITUTE(
   [Line]; 
   "xmlns:svg=""http://www.w3.org/2000/svg"""; 
   ""); 
  "xmlns=""http://www.w3.org/2000/svg""";
  "")

If you look at the code, it will wipe out the xmlns namespace attributes with a blank.

Rewrite the SVG file as a temp file

Finally, we're ready to create the output step.

  1. Create a Text file output step.

    • Set the file as: USA_Counties_with_FIPS_and_names plain.svg
    • Set the extension as tmp
    • The file will be saved as "USA_Counties_with_FIPS_and_names plain.svg.tmp"

  2. In the Content Tab, Set TAB as the delimiter and UNIX as file type.

  3. Finally in the Fields tab:

  4. Add NewLine as the output and make it a String with an undefined length.

  5. Run the entire transformation to make sure it works.

Step 6. Create the main transformation

Now that you've got a transformation that creates a temp file, you can parse it in Pentaho, join it to the unemployment data, and output a colorful SVG file.

To do this, I created a new Transformation called "01 Map Processing":

There are two streams that will merge in the end.

The first stream is the load of the unemployment data, and the second stream is the load of the map. They'll be merged into a new map.

Load the Unemployment Data

  1. Create a CSV file input step.

  2. Point it to the path of the data file. Most of the defaults are fine.

  3. Add the following fields:

    • Ignore1, String
    • Fips1, String
    • Fips2, String
    • Ignore4, String
    • Ignore5, String
    • Ignore6, String
    • Ignore7, String
    • Ignore8, String
    • Rate, Number

    Only Fips1, Fips2 and Rate will be read. As mentioned in the Flowing Data article, Fips1 and Fips2 are concatenated to generate the full Fips code.

  4. Next, add a formula step:

  5. There are two transformations that need to be done.

    Fips1 and Fips2 need to be combined to make the county id. The formula is simple: [Fips1]&[Fips2]

  6. We're going to replace the style of the county items based on the Rate value (using the same rules and colours as the Flowing Data article). Create another field called style_new. This formula is not hard, but it's long:

    "font-size:12px;fill-rule:nonzero;stroke:#FFFFFF;stroke-opacity:1;stroke-width:0.1;stroke-miterlimit:4;stroke-dasharray:none;stroke-linecap:butt;marker-start:none;stroke-linejoin:bevel;fill:" & IF( [Rate] > 10 ; "#980043" ; IF([Rate] > 8; "#DD1C77";IF([Rate] > 6; "#DF65B0";IF([Rate] > 4; "#C994C7";IF([Rate] > 2; "#D4B9DA";"F1EEF6" ) ) ) ))

    Basically the formula above is the string for the style, with a conditional statement at the end setting the colour based on the value for the Rate.

  7. Next, we'll sort the County Ids (Fips code) so that we can join them. Add a Sort Rows step to the Transformation. and sort by the field named id.

Load the Temporary Map SVG

  1. Create a Get data from XML step.

  2. Point it to the .tmp file you created and make sure you click Add.

  3. Go to the Content tab. We're only going to load the Path elements from the xml file, so in the Loop XPath, click on Get XPath Nodes, and select "/svg/path" Set the encoding to UTF-8 to be safe.

  4. Finally, in the fields tab:

  5. Click on Get Fields. Make sure the order is the same as the screenshot above. Rename the style to style_original.

  6. Create another sort step to sort by Id

Join the Data to the Map

Now we're ready to join the data to the map.

  1. Create a Merge Join step.

  2. Set the Sort Paths (xml file) as the first step, Sort Data step as the second step and make the join an LEFT outer. Use id as the key field for both steps.

  3. Now we're going to assign the styles using a Formula step.

  4. In case there are counties without data, we want to retain the original style from the map.

    In the join, if the style_new step is null (there are a few), we'll take the original style.

    Use the formula below:

    IF(isblank( [style_new]) ; [style_original] ; [style_new])

  5. Finally we want to output the new merged map data to an xml file. The problem is that the XML Output step can't output attributes into the XML, but we can fix that in post processing.

    Set the filename to paths_restyled and leave the file type as XML for now. The SVG data being written will still need to be cleaned up.

  6. In the Content tab, set the encoding to UTF-8, the parent XML element to svg and the row XML element to path:

  7. In the fields tab, make sure the fields are output as strings in this order: style, d, id. The order is important because of the way the output file will be parsed in the Post Processing stage.

Step 7. Post Processing

If you look at the file that was output in the previous transformation, it should look something like this.

<?xml version="1.0" encoding="UTF-8"?>
<svg>
<path>
&lt;style&gt;
font-size:12px;fill-rule:nonzero;stroke:#FFFFFF;stroke-opacity:1;stroke-width:0.1; stroke-miterlimit:4;stroke-dasharray:none;stroke-linecap:butt;marker-start:none;stroke-linejoin:bevel;fill:#DD1C77&lt;/style&gt; <d>M 396.96898,238.579 L 397.37998,242.405 L 396.53598,243.339 L 395.20198,242.757 L 393.90798,242.91 L 393.43098,242.995 L 393.39698,243.055 L 393.10198,243.726 L 391.60198,239.67 L 392.68698,239.615 L 393.79598,239.489 L 393.73498,238.922 L 396.96898,238.579</d> <id>01001</id></path>

Notice the difference in the path elements. Instead of having attributes, it has child elements. This is easily fixed with some simple search and replace.
Finally, we need to create the post processing to fix the SVG file. Create a new transformation called 02 Map Postprocessing

  1. Create a Text file input step and point it to the paths_restyled.xml file.

  2. Set the content tab to be the same as the one in the PreProcessing stage.

  3. Set the fields tab to be the same as the one in the PreProcessing stage.

  4. Create a formula step with a new field called LineOut1.

    Set the formula to the one below:

    SUBSTITUTE( SUBSTITUTE( SUBSTITUTE( SUBSTITUTE( SUBSTITUTE([Line];"";"";""" d=""");" ";""" id=""");" ";"""/>")

    This formula parses each path line and converts the elements to attributes.

  5. Create another formula step to re-embed those namespace attributes we took out earlier. Add a field called LineOut2.

    SUBSTITUTE([LineOut1];"";"
    ")

    Note that the def tag was stripped out earlier in the main transformation since we only iterated through the paths.

  6. Create a step for the Text file output and set the filename to be final_map, with the extension set to be svg.

  7. In the fields tab, set LineOut2 as the final output field. Make sure it's a string with an undefined length.

Step 8. Put It All Together

To integrate all the Transformations, feel free to create a Job that links all three of the transformation steps.

Now, when you run it, you should be able to create the following SVG:

Next Steps

So now you know how to build this map in Pentaho. While the instructions are long, a seasoned Pentaho developer could put this together at least as quickly as using the Python method in the Flowing Data article.

So how can this be changed to make it more useful?

Well if you have any county level statistics in a database, Excel file or whatnot, you can easily port this solution to that. Better yet, if this data is in a database and changing regularly, you can schedule Pentaho to build the map at regular intervals.
The great thing is that you can do this with free tools. No expensive maps or software packages to buy.

Cool!

[8]: http://blog.braintapper.com/assets/images/pentaho-map/USA_Counties_with_FIPS_and_names plain.svg

keyword svg  keyword pentaho  keyword maps  keyword flowing_data