2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / Greenplum3 / Greenplum CE 4.2 XML support
Carlo Ascani

Greenplum CE 4.2 XML support

July 23, 2012/0 Comments/in Greenplum /by Carlo Ascani

In this article we will see one of the most interesting features added in Greenplum 4.2:
XML datatype support.

Introduction

This article has an educational goal. Most likely you will not want to use the code presented here in a production system. However, it will give you hints and interesting ideas on how to add XML capabilities (both for loading and – most importantly – for searching) to your Greenplum database.

My intention is to show the potential of XML data type and XML data processing in a concurrent environment.

Goals

We will load XML data from a file into a Greenplum table in a XML field.

We will then perform some XPath queries on that data.

The xpath function operates on XML data, we will write a demo wrapper function which works on a XML file.

For the sake of this example, the XML file must reside on the server: we will use indeed the COPY command to load data from it.

Can I see some code?

Download and extract a dataset in XML format directly from the blog:

$ wget https://www.2ndquadrant.com/wp-content/uploads/2012/07/100songs.tar.gz
$ tar xvf 100songs.tar.gz

Copy the following code in a file named xml.sql:

BEGIN;

CREATE TABLE t (id INTEGER, data XML)
        DISTRIBUTED BY (id);

CREATE OR REPLACE FUNCTION _xpath(
        _xml_file VARCHAR,
        _xpath_query VARCHAR
)
RETURNS TEXT AS
$BODY$
DECLARE
        _is_well_formed BOOLEAN;
        _result TEXT;
BEGIN

        CREATE TEMP TABLE tmp (d TEXT) distributed by (d);
        EXECUTE $$ COPY tmp FROM $$ ||  quote_literal(_xml_file) ;

        INSERT INTO t (data) SELECT xml(d) FROM tmp;
        DROP TABLE tmp;

        SELECT INTO _is_well_formed xml_is_well_formed(text(data)) FROM t;
        IF _is_well_formed = False THEN
                RETURN 'XML document is not well formed';
        END IF;

        SELECT INTO _result xpath(_xpath_query, data ) FROM t;
        RETURN _result;
END;
$BODY$ LANGUAGE plpgsql;

SELECT _xpath('/greenplum/tests/data/100songs.xml',
   '/response/row/row[artist="Tullio De Piscopo"]/title/text()') AS song;

ROLLBACK;

Execute the SQL script with:

$ psql -qtxf xml.sql

You should see:

song | {"Stop Bajon (Primavera)"}

Code in detail

First of all, we have to know the XML document structure:

<response>
    <row>
        <row _id=... _uuid=... _position=... _address=...>
            <theme>...</theme>
            <title>...</title>
            <artist>...</artist>
            <year>...</year>
            <spotify_url url=.../>
        </row>
        ...
    </row>
</response>

The first part of the code creates a table to store XML data:

CREATE TABLE t (id INTEGER, data XML)
        DISTRIBUTED BY (id);

In the function body, we create a temporary table to store the file content as a single TEXT:

CREATE TEMP TABLE tmp (d TEXT) distributed by (d);
EXECUTE $$ COPY tmp FROM $$ ||  quote_literal(_xml_file) ;

Then, we convert the TEXT data into XML data using the xml function:

INSERT INTO t (data) SELECT xml(d) FROM tmp;
DROP TABLE tmp;

And we check if the XML is well formed:

SELECT INTO _is_well_formed xml_is_well_formed(text(data)) FROM t;
IF _is_well_formed = False THEN
    RETURN 'XML document is not well formed';
END IF;

Lastly, we perform the XPath query and return the result:

SELECT INTO _result xpath(_xpath_query, data ) FROM t;
RETURN _result;

The last part of the script shows how to call the function to retrieve data:

-- Get the Tullio De Piscopo's song from the list
SELECT _xpath('/greenplum/tests/data/100songs.xml', '/response/row/row[artist="Tullio De Piscopo"]/title/text()') AS song;

Conclusions

XPath is a technology of mixed feeling. Many hate it, some love it. However, when working with XML structured data type, it is a must.

The good thing with Greenplum 4.2 is that it is now possible to load millions of XML documents in a Greenplum database, have the data distributed in multiples segments and take advantage of parallelism when retrieving data using XPath expressions.

 

Tags: greenplum, greenplum ce, xml
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
Announcing Barman 1.0, Backup and Recovery Manager for PostgreSQL Barman is finally out
Scroll to top
×