2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Support & Services
    • Support
      • 24/7 PostgreSQL Support
      • Developer Support
    • DBA Services
      • Remote DBA
      • Database Monitoring
    • Consulting Services
      • Health Check
      • Performance Tuning
      • Database Security Audit
      • PostgreSQL Upgrade
    • Migration Services
      • Migrate to PostgreSQL
      • Migration Assessment
  • Products
    • Postgres-BDR ®
    • PostgreSQL High Availability
    • Kubernetes Operators for BDR & PostgreSQL
    • Managed PostgreSQL in the Cloud
    • Installers
      • Postgres Installer
      • 2UDA
    • 2ndQPostgres
    • pglogical
    • Barman
    • repmgr
    • OmniDB
    • SQL Firewall
    • Postgres-XL
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • AlwaysOn Postgres
      • PostgreSQL with High Availability
      • Security Best Practices
      • BDR
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • AlwaysOn Postgres
      • PostgreSQL with High Availability
      • Security Best Practices
      • BDR
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • What Does “2ndQuadrant” Mean?
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / Alvaro's PlanetPostgreSQL3 / XMLTABLE in PostgreSQL
Álvaro Herrera

XMLTABLE in PostgreSQL

March 8, 2017/14 Comments/in Alvaro's PlanetPostgreSQL, PostgreSQL /by Álvaro Herrera

I just committed a patch by Pavel Stěhule that adds the XMLTABLE functionality to PostgreSQL 10.  XMLTABLE is a very useful feature dictated by the SQL/XML standard, that lets you turn your XML data into relational form, so that you can mix it with the rest of your relational data. This feature has many uses; keep reading for some details on it.

Probably the most interesting use case of XMLTABLE is to extract data from some XML document to insert into a relational table, during in-database ETL processing. However, XMLTABLE can be used on-the-fly on data stored in XML columns, so that once the data is in relational form, you can apply any standard operations you want, such as adding WHERE clauses, doing aggregations, joining to other tables, and so on.

A Simple Example

As an example, let’s suppose you administer a hotel chain, and that the data is stored thusly:

CREATE TABLE hoteldata AS SELECT xml
$$<hotels>
 <hotel id="mancha">
  <name>La Mancha</name>
  <rooms>
   <room id="201"><capacity>3</capacity><comment>Great view of the Channel</comment></room>
   <room id="202"><capacity>5</capacity></room>
  </rooms>
  <personnel>
   <person id="1025">
    <name>Ferdinando Quijana</name><salary currency="PTA">45000</salary>
   </person>
  </personnel>
 </hotel>
  <hotel id="valpo">
  <name>Valparaíso</name>
  <rooms>
   <room id="201"><capacity>2</capacity><comment>Very noisy</comment></room>
   <room id="202"><capacity>2</capacity></room>
  </rooms>
  <personnel>
   <person id="1026"><name>Katharina Wuntz</name><salary currency="EUR">50000</salary></person>
   <person id="1027"><name>Diego Velázquez</name><salary currency="CLP">1200000</salary></person>
  </personnel>
 </hotel>
</hotels>$$ AS hotels;

With XMLTABLE, you can turn this into a relationally-formatted table consisting of room numbers and capacity, annotating for each hotel in your chain:

SELECT xmltable.*
  FROM hoteldata,
       XMLTABLE ('/hotels/hotel/rooms/room' PASSING hotels
                 COLUMNS
                    id FOR ORDINALITY,
                    hotel_name text PATH '../../name' NOT NULL,
                    room_id int PATH '@id' NOT NULL,
                    capacity int,
                    comment text PATH 'comment' DEFAULT 'A regular room'
                );
id hotel_name room_id capacity comment
1 La Mancha 201 3 Great view of the Channel
2 La Mancha 202 5 A regular room
3 Valparaíso 201 2 Very noisy
4 Valparaíso 202 2 A regular room

Explaining the syntax

Let’s study the query above. The XMLTABLE clause must go in the FROM part of the query. We also have hoteldata in the FROM, which is what feeds the data into XMLTABLE.

First, the PASSING clause is where we specify the XML data that we want to process. In this case, the data comes from the hotels column in the hoteldata table. We call this the document expression.

Just before the PASSING clause you see an XPath expression '/hotels/hotel/rooms/room'. We call this the row-generating expression or just the row expression.

We have the COLUMNS clause next, declaring a few columns. For each column we indicate a data type as well as an optional PATH clause, which we call the column expression.

XMLTABLE‘s theory of operation is that the row expression is applied to the document expression, slicing the document into pieces to generate rows; for each row so generated, the various column expressions are applied to obtain the values for each column.

The column expression is an XPath expression that obtains a value starting from the XML for the current row. If no PATH is specified, then the column name itself is used as the XPath expression. Note that in the column hotel_name we used a path having “../“, which means to “go up” in the XML document to grab values from the the “container” objects in the document. We can also use xml PATH '.' in a row, which gives us the full source XML for that row.

One column can be marked FOR ORDINALITY. The column is then of type INTEGER, and is numbered sequentially for each row obtained from the document. (If there are multiple input documents, such as when you have multiple rows in a table, the counter starts from 1 for each new document).

There is also a DEFAULT clause. If the XPath for a column does not match a value for a certain row, then the DEFAULT value is used.

Some of these columns have been marked as NOT NULL. If there is no match, and no DEFAULT clause is specified (or the DEFAULT also evaluates to NULL), an error is thrown.

I won’t go into more detail of XPath, which is a powerful language, but I can offer the XPath article in Wikipedia, and the official recommendation document from W3C as useful resources.

The full XMLTABLE syntax

The documented syntax synopsis is:

xmltable( [XMLNAMESPACES(namespace uri AS namespace name[, ...])] row_expression PASSING [BY REF] document_expression [BY REF] COLUMNS name { type [PATH column_expression] [DEFAULT expr] [NOT NULL | NULL] | FOR ORDINALITY } [, ...] )

Note that the document expression can be a reference to some a table you have in the FROM clause, or it can be a complete XML document as a string literal. The BY REF clauses don’t have any effect; they are there for compatibility with the standard and with other database systems.

I have not covered the XMLNAMESPACES clause in this post; I’m leaving that for a future installment.

Applying SQL on top

As mentioned, once XMLTABLE has processed the data into relational form, you can do whatever you want using well-known tools. For instance, if you had another XML document with more personnel in each hotel,

INSERT INTO hoteldata VALUES (xml
$$<hotels> 
 <hotel id="mancha">
  <name>La Mancha</name>
  <personnel>
   <person id="1028">
    <name>Sancho Panza</name><salary currency="PTA">35000</salary>
   </person>
  </personnel>
 </hotel>
 <hotel id="valpo">
  <name>Valparaíso</name>
  <personnel>
   <person id="1029"><name>Kurt Werner</name><salary currency="EUR">30000</salary></person>
  </personnel>
 </hotel>
</hotels>$$);

It’s easy to obtain the total salaries for each currency you need to pay at each hotel,

  SELECT hotel, currency, sum(salary)
    FROM hoteldata,
XMLTABLE ('/hotels/hotel/personnel/person' PASSING hotels
       COLUMNS hotel text PATH '../../name' NOT NULL,
               salary integer PATH 'salary' NOT NULL,
               currency text PATH 'salary/@currency' NOT NULL
) GROUP BY hotel, currency;
hotel currency sum
Valparaíso CLP 1200000
Valparaíso EUR 80000
La Mancha PTA 80000

Conclusion

In this article, I have covered the new feature XMLTABLE to appear in PostgreSQL version 10. I think XMLTABLE is a great feature for integrating external data, and I hope you find it valuable too. Please test it and report any problems, so that we can sort them out before the final release. If you like XMLTABLE, be sure to let us know leaving a comment!

Tags: 2QLovesPG, PostgreSQL, PostgreSQL10
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
14 replies
  1. Oleg Bartunov
    Oleg Bartunov says:
    March 8, 2017 at 10:03 pm

    That’s great, now we should hack XMLTABLE infrastructure to support JSON_TABLE.

    Reply
    • alvherre
      alvherre says:
      March 13, 2017 at 3:22 pm

      Yes, thanks!

      One thing I noticed in your patch is that multiple FOR ORDINALITY columns are allowed, but this framework only supports one (because that’s what the standard requires). Does JSON_TABLE require more than one?

      Reply
      • Oleg Bartunov
        Oleg Bartunov says:
        March 20, 2017 at 2:15 pm

        Yes, we didn’t test the number of FOR ORDINALITY clauses, but as we’ll use XMLTABLE infrastructure, we’ll add the check.

        Reply
  2. Goo
    Goo says:
    January 19, 2018 at 6:22 am

    COLUMNS
    dn text PATH ‘MO/DN’,

    ERROR: more than one value returned by column XPath expression

    Contains more than one, Can you choose the first one?

    Reply
    • alvherre
      alvherre says:
      September 3, 2018 at 7:27 pm

      Sure, there’s XPath syntax for array indexes. As I recall, it would be `PATH ‘MO/DN[1]’`, or maybe it could be `PATH ‘MO[1]/DN’`, depending on where the array is. I don’t know what a good tutorial for XPath is, but the wikipedia page seems like a decent starting point: https://en.wikipedia.org/wiki/XPath

      Reply
  3. puqun
    puqun says:
    February 14, 2018 at 3:10 am

    when i use xmltable cross join xmltable,
    As a result of the following examples, I am not sure whether this result is correct or not.
    ex:
    CREATE TABLE xmltest AS SELECT
    xml $$

    $$ AS data;

    SELECT xmltable.*
    FROM xmltest,
    XMLTABLE (XMLNAMESPACES(‘http://example.com/myns’ AS x,
    ‘http://example.com/b’ AS “B”),
    ‘/x:example/x:item’
    PASSING by ref(SELECT data FROM xmltest)
    COLUMNS
    id FOR ORDINALITY,
    foo int PATH ‘@foo’,
    bar int PATH ‘@B:bar’,
    name text DEFAULT ‘not specified’) CROSS JOIN
    XMLTABLE (XMLNAMESPACES(‘http://example.com/myns’ AS x,
    ‘http://example.com/b’ AS “B”),
    ‘/x:example/x:item’
    PASSING by ref(SELECT data FROM xmltest)
    COLUMNS
    id FOR ORDINALITY,
    foo int PATH ‘@foo’,
    bar int PATH ‘@B:bar’,
    name text DEFAULT ‘not specified’)t2;

    The result1:
    id | foo | bar | name
    —-+—–+—–+—————
    1 | 100 | 2 | not specified
    1 | 100 | 2 | not specified
    1 | 100 | 2 | not specified
    1 | 100 | 2 | not specified
    2 | 300 | 4 | not specified
    2 | 300 | 4 | not specified
    2 | 300 | 4 | not specified
    2 | 300 | 4 | not specified
    3 | 400 | 5 | not specified
    3 | 400 | 5 | not specified
    3 | 400 | 5 | not specified
    3 | 400 | 5 | not specified
    4 | 300 | 5 | not specified
    4 | 300 | 5 | not specified
    4 | 300 | 5 | not specified
    4 | 300 | 5 | not specified
    (16 rows)

    But, when I cross join two common tables of the same data as xmltable ,the result2:
    id | foo | bar | name | id | foo | bar | name
    —-+—–+—–+—————+—-+—–+—–+—————
    1 | 100 | 2 | not specified | 1 | 100 | 2 | not specified
    1 | 100 | 2 | not specified | 2 | 300 | 4 | not specified
    1 | 100 | 2 | not specified | 3 | 400 | 5 | not specified
    1 | 100 | 2 | not specified | 4 | 300 | 5 | not specified
    2 | 300 | 4 | not specified | 1 | 100 | 2 | not specified
    2 | 300 | 4 | not specified | 2 | 300 | 4 | not specified
    2 | 300 | 4 | not specified | 3 | 400 | 5 | not specified
    2 | 300 | 4 | not specified | 4 | 300 | 5 | not specified
    3 | 400 | 5 | not specified | 1 | 100 | 2 | not specified
    3 | 400 | 5 | not specified | 2 | 300 | 4 | not specified
    3 | 400 | 5 | not specified | 3 | 400 | 5 | not specified
    3 | 400 | 5 | not specified | 4 | 300 | 5 | not specified

    So, I want to know,
    The result1 is right or not?

    Reply
    • alvherre
      alvherre says:
      September 3, 2018 at 7:28 pm

      Sorry, I didn’t understand your XML very well — apparently all the XML markup has been removed by wordpress. Maybe you can post the question elsewhere, say [email protected] or StackOverflow? If you then drop a link here I can try to reply. (Or maybe in WordPress it is possible to markup XML properly for a comment? I don’t know.) Thanks.

      Reply
  4. David Day
    David Day says:
    September 3, 2018 at 11:50 am

    I was hoping to get some advise and potentially a solution to my problem.

    This example is just showing the concept of what we’re trying to do as the XML structure we hold can be over 2,000 lines.

    In Oracle we’re migrating our code into postgres so this is how we’re currently doing it in Oracle and what we’re trying to migrate into postgres. One of the problems we’re hitting is using MULTIPLE XMLTABLE functions to link nodes with a single XML tag elements to one’s with multiple tag elements as show in this example. The problem with the XMLTABLE in postgres we don’t seem to be able to pass structures from one XMLTABLE to another to structure the output as we require.

    Here is a test case example:-

    CREATE TABLE XML_TABLE (
    id NUMERIC(19,0) NOT NULL,
    resultxml XML
    );

    INSERT into XML_TABLE(ID,RESULTXML)
    VALUES
    (1,’

    AA
    XX

    test item 1
    29.9

    test item 2
    30.9

    BB
    ZZ

    test item 1
    30.9

    test item 2
    40.9

    CC
    YY

    test item 1
    50.9

    test item 2
    6-.9

    ‘);

    SELECT xt.id ,
    xt1.RESULT_POS,
    xt1.product,
    xt1.name,
    xt2.item_pos,
    xt2.item_text,
    xt2.item_value
    FROM TEST_LOAD.XML_TABLE xt,
    XMLTABLE(‘//storedresults/result’
    PASSING xt.resultxml
    COLUMNS RESULT_POS FOR ORDINALITY,
    PRODUCT CHARACTER VARYING(20) path ‘product’,
    NAME CHARACTER VARYING(20) path ‘name’,
    ITEMS_XML XML PATH ‘//items/item’) xt1,
    XMLTABLE(‘item’
    PASSING xt1.ITEMS_XML
    COLUMNS
    ITEM_POS FOR ORDINALITY,
    ITEM_TEXT CHARACTER VARYING(300) PATH ‘text’,
    ITEM_VALUE CHARACTER VARYING(300) PATH ‘value’) xt2
    WHERE xt.id = 1;

    I get the following error.

    ERROR: could not parse XML document
    DETAIL: line 4: Extra content at the end of the document

    In Oracle it works fine doing this so just wondering if there is another way of doing this in Postgres that is not going to be massive on performance consumption as our the LIVE solution is dealing with high volumes of data which extract data from a large XML structure. We have over 20 extracts from the same XML structure for different components. The postion element is key to make our record unique for a combination of fields we use for the primary key. This is to support CDC from the extract data tables we’ve created and keep uploading new data into.

    Thanks
    David

    Reply
    • alvherre
      alvherre says:
      September 3, 2018 at 7:14 pm

      Sorry, I didn’t understand your XML very well — apparently all the XML markup has been removed by wordpress. Maybe you can post the question elsewhere, say [email protected] or StackOverflow? If you then drop a link here I can try to reply. (Or maybe in WordPress it is possible to markup XML properly for a comment? I don’t know.) Thanks.

      Reply
      • David Day
        David Day says:
        September 4, 2018 at 11:06 am

        Thanks for your quick response back, i have send an email to the [email protected]

        Reply
  5. Ivan Ustûžanin
    Ivan Ustûžanin says:
    September 15, 2018 at 11:31 pm

    For those who like me stumbled on this post googling about XMLTABLE throwing “unexpected XPath object type 3” (or “type 2” for the matter): the gist is that XMLTABLE *does NOT* handle any XPath types except nodesets and strings (I’ve looked at the source code). Use XPath string() function to work around this issue. By the way type 2 is a boolean value and type 3 is a number.

    Reply
    • Ivan Ustûžanin
      Ivan Ustûžanin says:
      September 17, 2018 at 8:11 pm

      I also noticed that despite a column having an array type a nodeset returned for column containing more that one node also throws the error “Cardinality violation: 7 ERROR: the XPath expression for a column returned more than one value” (sorry, it’s a back-translation from Russian).

      Reply
  6. Anand
    Anand says:
    September 19, 2018 at 3:26 pm

    Hi,
    Thanks for explanation. We have an application which is hosted on Postgres release 9.6.4 and we need XMLTable & XMLNamespaces functionality to process big xml files to flush data into tables.

    Is it possible back-port these features into old release? Can I take underlying source code and deploy the code into 9.6.3 release?

    Thanks for your response.

    Reply
  7. jo
    jo says:
    June 25, 2019 at 12:26 pm

    Hi thanks a lot for the very helpful tutorial !
    Would you have something on the reverse function, i.e. exporting PGSQL data to XML ?

    Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
Autovacuum Tuning Basics JSON version of XMLTABLE example
Scroll to top
×