Predictive Analytics with 2UDA

2UDA is much more than just the sum of its parts. It provides 3 great open source software packages, which are already very useful by themselves. But the true power lies in the ability to use them together by complementing each other.

Orange makes it very easy to use a range of data mining approaches, whether you are a skilled data scientist or coming from another field and looking to enrich the understanding of data you have collected. It contains a toolkit of methods ranging from exploratory data analysis and visualizations (e.g. scatter plot, box plot, distributions, …) to unsupervised (clustering, PCA, MDS, …) and supervised (linear and logistic regression, SVM, classification trees, ensemble methods, …) machine learning algorithms that can be used for predictive analytics.

Orange usually loads data from a file into the working memory, which is effective for small, fixed data sets. But it can also be used in connection with a PostgreSQL backend containing the data. This simplifies working with data that is frequently being updated and changing. But even more importantly, it can make use of the databases capacity to store huge amounts of data that does not fit into memory, thus bringing big data analysis capabilities to Orange. When working on big remote data sets, some computations for statistical analyses are offloaded directly onto the database server where the data resides. But even methods implemented to work on in-memory data can be used through fast sampling provided by the database. This combination of PostgreSQL and Orange can therefore also be used as part of data warehousing and business intelligence solutions, with more functionality for this aspect being actively added and developed.

Working with data analytics inside the database means we can exploit the power of the underlying data engine, so all features for parallel processing etc become available.

Click here to download the tutorial in PDF format.