Webinar: Data Integration With PostgreSQL [Follow Up]

Having a unified view when combining data from different sources is essential for every business. Whether you want to connect to a cloud application, import/export data to/from a text file, or sync data between your datasource and a 3rd party platform – having a database platform that provides a seamless and effective data integration process is critical.

This week on Tuesday, 13 February, 2ndQuadrant hosted the second webinar in their PostgreSQL webinar series. Principal Consultant, Shaun Thomas, conducted the webinar, covering various topics around integration of PostgreSQL with other data sources and database technologies.

The recording of the Webinar is now available. Those who couldn’t make it to the live session can now view Data Integration with PostgreSQL here.

Below, Shaun provides answers to the questions that he was unable to respond to during the live webinar.

Q1: Can all FDWs read/write to their respective backends?

A: Unfortunately not. The Mongo FDW wasn’t able to write until relatively recently. Every FDW has a different answer for this, but good maintainers are upfront about it. The best answer is to visit the list of Postgres FDWs and find one you want to use. I haven’t seen one that doesn’t say whether or not it’s writable in the site documentation or README.

Q2: Why would we want a database with none of its own data?

A: To give it to anyone, use it for reports, drop and recreate it to capture external DDL changes. It doesn’t really exist, and the data is all disposable, so it’s essentially a portable set-theory translator API with external connectors. Combined with a good Postgres GUI or even just a Postgres driver, it essentially becomes a data Swiss Army knife.

Q3: Is it possible to read from Kafka?

A: It is not without a separate wrapper to decode the Kafka topic contents. Essentially since Kafka can contain literally anything as a queue-based stream, each “topic” should either have its own personalized decoder, or each topic can correspond to a table in Postgres with a universal decoder specific to your application. This is one area where writing your own FDW is the best approach.

Q4: What is the database IDE (tool) that you are using to display the colored output of commands? (like SQL Developer in Oracle)

A: The IDE is Geany, my primary text editor. It’s not Postgres-specific, and does syntax highlighting for a multitude of different languages. It’s a tool I’ve been using on Linux for probably ten years after I got frustrated with Gnome’s Gedit.

Q5: How does postgres FDW handle lost connections? Is there any caching available other than materialized views?

A: Lost connections will be reconnected when a new query is submitted. Any in-transit activity is lost until then. So far as caching, materialized views are still the best bet. But as I explained during the webinar itself, there are different layers of cache. Depending on the complexity of the FDW itself, caching can be part of the implementation. For example, I could have made a two-line modification to my REST driver to have it cache to a global Python variable, such that each subsequent use would use that copy instead of interrogating the REST interface. A more advanced FDW could have a cache maintenance/invalidation algorithm to ensure the local copy always takes precedence, and that’s tied to the session itself. So until the connection is closed, caching is valid.

Q6: Is there an established FDW for Cassandra?

A: The Cassandra FDW available on Github is the latest known native wrapper, but it hasn’t been updated in two years. The others are based on Multicorn, and are consequently using Python to interface with Cassandra, and will probably be relatively slow. The answer then, is probably not.

Q7: What is the best method to create an Index for a 60 million record table where we use 6 columns in the join condition – because the same type of index works fine on Oracle and not on Postgres.

A: This question is a bit vague. Are the six columns all in the same index? Just because they’re in the join condition, doesn’t mean they need to be in any index. The point is to create indexes on columns with good selectivity, and let the database handle the rest. If 3 columns get you a 1=1 match, further columns just bloat the index size. Regardless, there’s nothing wrong with putting six columns into a single index, and Postgres will work just fine that way. Besides that, it could be the other table in the join that needs to be indexed. There just isn’t enough information here to provide a meaningful answer. Perhaps this question was simply phrased incorrectly.

Q8: Can foreign data wrappers accept parameters like making a foreign wrapper to another Postgres function?

A: Not as such but you can use a view to mask the function call where the function lives.

 CREATE VIEW my_view AS SELECT * FROM my_function(id=15);

Then import the view definition like we did when hiding the remote JOIN.

Q9: What are the options for Postgres database encryption?

A:  The postgres_fdw wrapper uses the libpq connection library. As such, it’s fully compatible with Postgres SSL connection encryption. Other FDWs would need to negotiate their own encryption needs as part of their definitions. Each FDW runs as a library within Postgres, so the wrapper needs to follow necessary encryption protocols for the target, if such are provided. I.e., the MySQL FDW handles its own encryption when communicating with MySQL.

For any questions, comments, or feedback, please visit our website or send an email to [email protected].

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *