2UDA RC1 – New features in Orange (Part 1)
The 2UDA installation package was updated recently to include the newly released PostgreSQL 9.5 RC1. Also found in the new package is an updated version of Orange bringing some new features, improvements, and bug fixes. Summary of the more noticeable changes can be found in 2UDA release notes.
In this first of a series of posts, I will explore changes related to working on data stored in PostgreSQL databases: logging, approximate preprocessing, materializing queries, and schema selection. There are lots of other features to talk about, stay tuned to read the subsequent blogs.
Logging Let’s start with the new logging functionality. For anyone interested in exploring larger databases or performing more complex analyses with 2UDA, it is now much easier to see where most of the time is being spent. From now on, Orange will log all SQL queries and their execution times into a log file. This can be analysed later to help us optimize the database structure or make the most of what is available and avoid bottlenecks. It should prove useful for a range of people from the developers, administrators, and to the end users.
Approximate preprocessing The benefits of sampling made possible by the new feature of PostgreSQL 9.5, the TABLESAMPLE clause, are finding their way into more and more components. This can greatly improve the experience on large data where a single pass over all rows takes too long. Discretize and Impute preprocessors now execute on 1 second samples of data so they can be used automatically in interactive widgets, without sending the user for another cup of coffee (or on holidays), waiting for the results. To split a numerical variable into a number of discrete intervals, it is not necessary to compute the thresholds from billions of values – a representative sample of thousands of values should be just as good.
Materializing queries Another addition in the new version is related to the existing option of using custom SQL queries as input data. Normally Orange works on single tables containing data instances (rows) described with features (columns). However, the user has the option to input a (possibly complex) custom SQL query instead, which compiles the feature descriptions from many tables using e.g. joins and aggregate functions. A new option has been added to save the results to a table directly from Orange and use the stored results instead of repeating the (slower) query in many computations. A user that doesn’t mind some direct database work could create a materialized view there instead, but for others it is a convenient feature available directly from Orange.
Schema selection When using databases with many schemas and tables, it is now easier to select the desired table to be used in Orange by specifying the schema in addition to other database connection parameters. Before, Orange always showed a drop-down menu with all tables from schemas on the user’s
search_path. Now, optionally specifying a specific schema can list tables that were previously not visible or filter the list to relevant tables only.
You can download the 2UDA package here.
Part 2 describes the new Color widget and reporting functionality.
Will this software work with postgres extensions such as postgis which adds a lot of extra sql functions and a new datatype (a geometry)
Having additional extensions should not cause any compatibility problems. But working with new data types directly will probably not work out of the box – Orange expects tables with columns containing standard data types (numerical types, strings, …).
However, if you can construct a SELECT statement that uses e.g. a table of geometry objects and extracts 2 numerical values (x and y coordinates) then Orange can load that data and use it (e.g. show distributions of those values, visualize scatter plots, etc.)
If you want to try this, you can select the “Custom SQL” option in the SQL Table widget and input the SELECT statement directly.