For the sake of simplicity I’ll assume that:
- you have a very simple test database on Greenplum. With two tables and a logical 1:N relationship between them: we choose states and users (a user can live in just one state)
- you have two CSV files with the same data structure, one for the states and one for the users.
The first step is to download Kettle from Pentaho’s website and then install it.
Then we ned to create a brand new Job.
So right click on the Jobs folder, and select “new”, you will
notice that on the left of the Kettle interface we have two tabs,
View and Design. The View one will contain all the information about
the job, and the Design tab have all the components we’ll need for
Then go in the View tab, and right
click on the “database connections” folder, then select new.
A window will pop up and we can insert
all the necessary data about our Greenplum Connection. Once you have filled
all the fields in the form, you can test your database connection with the “test” button.
When this step is completed, it is
necessary to share our brand new connection with all the components
we are going to create in this Job. So right click on the new
connection and select the “share” option.
Go in the Design tab, and, under the
“general” folder, select the “start” component. Then drag and
drop it on the job designer.
Do the same with two “Transformation”
and a “Success” components.
It’s now time to connect them all. If you
hold the shift button, then left click on the “start” component and
move your mouse in the direction of a transformation component, you
will see a green line with a green symbol. Ue this line to connect an element to another.
Now all the elements are connected in
case of Success. We can add a “failure” step (e.g. we want
to receive a mail in case of failure).
Drag a “mail” component from the
“mail” folder in the design tab, and drop it in the design
Now, connect the two
transformations to the mail element. You will notice that this time the
lines will be red with a failure symbol on them.
The job Flow is completed. In the next
part of this article, we’ll continue with the data import.