This article gives a step by step guide to utilizing Machine Learning capabilities with 2UDA. In the article, we’ll use an example of Animals to predict whether they are mammals, Birds, Fish or Insects.
We’re going to use 2UDA version 11.6-1 to implement the Machine Learning model. 2UDA version 11.6-1 combines:
- PostgreSQL 11.6
- Orange 3.23.0
You can find the latest version of 2UDA here.
Step 1: Load training dataset into PostgreSQL
The sample dataset that is used to train our model is available at the official Orange GitHub repository here.
Follow these steps to load the training data into PostgreSQL tables:
- Connect to PostgreSQL via psql, OmniDB or any other tool that you are familiar with.
- Create a table to store our training data. Here it is named as training_data.
CREATE TABLE training_data( name VARCHAR (100), hair integer, feathers integer, eggs integer, milk integer, airborne integer, aquatic integer, predator integer, toothed integer, backbone integer, breathes integer, venomous integer, fins integer, legs integer, tail integer, domestic integer, catsize integer, type VARCHAR (100) );
- Insert training data into the table via COPY query. Before executing COPY query make sure that PostgreSQL has required read permissions on the data file otherwise COPY operation will fail.
NOTE: Please make sure you type a tab space in between single quotes after the delimiter keyword.
COPY training_data FROM 'Path_to_training_data_file’ with delimiter ' ' csv header;
Please find the screenshot of training dataset bellow
NOTE: Rows two and three of the training dataset in the .tab file contain some meta information. Since it is not needed at this point, it has been removed from the file.
Step 2: Create workflow with Orange
- Go to the desktop and double click on the Orange icon.
- This is what the start-up page looks like. Select New option and it will create a blank project.
Now you are ready to apply the Machine Learning model on the dataset.
Step 3: Select Machine Learning model to train the data
For this article, k-nearest neighbours (KNN) Machine Learning model is used to train the data. Once the data training process is complete, In the next step test data is passed to the Prediction widget to check the accuracy of predictions.
Step 4: Import training data from PostgreSQL into Orange
This training dataset will be used to train the Machine Learning model.
- Drag and Drop SQL Table widget from the Data menu.
- Rename widget (optional)
- Right-click on the SQL Table widget.
- Select Rename.
- Connect with PostgreSQL to load the training dataset:
- Double click on the Training data widget.
- Enter credentials to connect to the PostgreSQL database.
- Press the reload button to load all the available tables from the given database.
- Select training_data table from Drop down menu and close the pop-up.
Step 5: Add Target column
This step is important because the Machine Learning model will try to predict the data for this target variable/column:
- Drag and drop Select Columns widget from the data menu.
- Double click on the Select Columns widget.
- Search your target column under Features label. Here, type is used as a target variable because we need to see what type a given animal is.
- Drag and drop it under Target Variable box and close the pop-up.
Step 6: Columns ranking
You can Rank or Score the training variable/columns according to their correlation with the target column.
- Drag and drop Rank widget from the data menu.
- Draw a link line from Select columns widget to Rank widget .
- Double click on the Rank widget to see the most related columns in the training data table. It will be selecting the top 5 columns by default.
Step 7: Data training
In this step, the Machine Learning Model (KNN) will be trained with the training dataset. Please follow the following steps:
- Drag and drop KNN widget from the Model menu.
- Draw a link line from Rank widget to KNN widget.
Step 8: Load test dataset into PostgreSQL
A separate test dataset is created to perform predictions. Please follow the steps to load test dataset into PostgreSQL table.
- Create a table to store our test data. Here it is named as test_data.
CREATE TABLE test_data( name VARCHAR (100), hair integer, feathers integer, eggs integer, milk integer, airborne integer, aquatic integer, predator integer, toothed integer, backbone integer, breathes integer, venomous integer, fins integer, legs integer, tail integer, domestic integer, catsize integer, type VARCHAR (100) );
- Insert test data into the test table via COPY query. Before executing COPY query please make sure that PostgreSQL has required read permissions on the data file otherwise COPY operation will fail.
NOTE: Please make sure you type a tab space between single quotes after the delimiter keyword. A question mark is intentionally placed in the type column of the test dataset because we need to figure out the type of a given animal with our Machine Learning model.
COPY test_data FROM 'Path_to_test_data_file’ with delimiter ' ' csv header;
Please find the screenshot of test dataset bellow
Step 9: Import the test data from PostgreSQL into Orange
Please follow the following steps to apply the predictions.
- Drag and drop SQL Table widget from the data menu.
- Rename widget (Optional)
- Right-click on the SQL Table widget.
- Select Rename.
- Connect with PostgreSQL to load test data.
- Double click on Test data widget.
- Connect it with Test data table from PostgreSQL.
Now we are ready to perform predictions.
Step 10: Predictions
Prediction widget will try to predict the test data based on training data from KNN.
- Drag and drop Prediction widget from the Evaluate menu.
- Draw a link line form Test data widget to Prediction widget.
- Draw a link line from KNN widget to Prediction widget.
Step 11: Results
Double click on Prediction widget to view the results.
Understanding the Results
You will see 2 main tables in the prediction window. The table on the left side shows the predicted results, while the table on the right shows the original test data, which was provided for predictions.
Since the KNN model was used to train data so you will see one column named KNN that lists the results.
As we know:
- Horse is a Mammal
- Trout is a Fish
- Turkey is a Bird
- Lizard is a Reptile
So you can see KNN determine Lizard as a Fish, which is incorrect. But all other predictions are correct.
If you see the table on the left side in the prediction widget’s output, it has some numbers before the predicted type i.e, 0.20, 0.80, 1.00. These numbers show the accuracy of the predicted type.
We have used 7 types of animals in the training dataset, so it shows a total number of 7 columns with accuracy values each column will represent 1 type of animal. You can check which column is representing what type of animal by looking at the list available on the left side of your screen under Predicted probabilities for label. If you look at the first row which says Horse is a Mammal. We can see Its accuracy is 0.80 (80% from 3rd last column). Same goes with other examples Turkey is a bird and its accuracy is 1.00 (100% from 2nd column).
In this article, we have used the k-nearest neighbours’ algorithm (KNN) to implement Machine Learning model. In the next blog, we will be using the Support Vector Machine (SVM) model.