[Video] Data Integration with PostgreSQL
Just in case you missed the live broadcast, the video of my presentation below covers various topics around integration of PostgreSQL with other data sources and database technologies.
This presentation covers the following topics:
- What is a Foreign Data Wrapper?
- How to query MySQL, a Flat file, a Python script, a REST interface and a different Postgres Node
- Perform all of the above simultaneously
- Take snapshots of this data for fast access
- Tweak remote systems for better performance
- Package as an API for distribution
- Stream to Kafka
- Write data to… MongoDB!?
- What does all of this have in common?
It’s an exciting topic, and I hope more developers and admins begin to see Postgres as the global integration system it really is.
I am part of a Healthcare system who is recently using postgresql as its database to run reports. We have a tfew external databases for certain groups of hospitals that spit out tables every night that we need to laod ito postgres in order to run reports.
Is there a way to have postgresql integrate with these external systems to load those tables into postgres automatically ? We are trying to limit the amount of manual work it takes for this to happen.
Often in situations like these, you can use Foreign Data Wrappers. If the hospital systems have some kind of access API you can depend on, it’s possible to write your own foreign data wrapper so that Postgres can treat the remote systems as if they were tables. Part of the reason for the demonstration was to show that the parts are all there to allow Postgres to act as an intermediate layer to glue everything together. The tricky part is finding that interface that allows Postgres to latch on.
Thanks for sharing about this Shaun. I have a question about the Kafka integration you showed. It looks like I need a separate database to subscribe to the Kafka stream. But I don’t understand why. It’s almost like a dummy database, is it not? Supposing I only want to send events to Kafka then I’m not sending it to a database at all, just to Kafka. So shouldn’t I only need the “extra” database?
Currently the subscription is just so there’s an end target for the logical decoding to operate. Since pglogical uses logical replication streaming to decode WAL contents, there needs to be a subscriber to consume the stream. In theory it would be possible to write a separate tool to consume the logical slot instead, but that’s not something we’ve developed at this time.