2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / Gianni's PlanetPostgreSQL3 / CTE and the Birthday Paradox
Gianni Ciolli

CTE and the Birthday Paradox

September 20, 2012/0 Comments/in Gianni's PlanetPostgreSQL, PostgreSQL /by Gianni Ciolli

An interesting query has been twitted by Will Leinweber from Postgres Open:

-- this returns a different result each time it is ran
with recursive s as (
  select random()
union
  select random() from s
) select count(*) from s;

I like this example: a surprising result, which can be explained by (and indeed helps to explain) CTE behaviour.

Unexpected truths are denoted with the word “paradox”, and in fact this is a manifestation (an “instance”, in programmers’ jargon) of what is known as the Birthday Paradox.

Its simplest formulation is probably this: if you randomly choose 23 persons, the probability that two of them share the same birthday is greater than 50%.

The result is unexpected, because there are 366 different birthdays, and the number 23 seems very small compared to 366.

However it is correct, as it can be shown with a direct computation. In PostgreSQL we can run another recursive CTE:

WITH RECURSIVE r(i,acc) AS (
  SELECT 1, 1 :: double precision
UNION
  SELECT i + 1,
    acc * (((366 - i) :: double precision) / 366)
  FROM r WHERE acc > 0.5
) SELECT count(1) FROM r;

producing 23 as the result.

A recursive CTE stops when the recursive step does not add any new rows. In the last query, acc represents the probability that the first i birthdays are distinct, so recursion stops when that number is not above 50%.

In the query mentioned at the beginning, which we’ll call the “random query” for short, the recursive CTE terminates when random() does not add a new row. That is, when the randomly-computed value has already been computed in a previous iteration; that’s because the recursive CTE is using UNION instead of UNION ALL.

This is indeed the Birthday paradox, with 366 replaced by the maximum possible number of distinct values that random() can produce. What exactly is that number?

The random() function returns a double precision value, whose exact definition depends on the system. Not all the double precision values can be produced, though; the underlying C function can produce 2^31 different results, regardless of the bit size of a double precision value. This is good enough in practice, and at the same time compatibility with all the various architectures and library implementations is ensured.

So we can replace 366 with 2^31 in our query, and instead of 23 we get 54563 as the answer.

Does it come near to the actual output of the random query? Let us run it a few times, collect the result, and compute the average:

gianni=# create table t(n int);
CREATE TABLE

gianni=# with recursive s as (
  select random()
union
  select random() from s
) insert into t select count(1) from s;
INSERT 0 1

/* repeat the last query 49 times */

gianni=# select count(1), avg(n) from t;

 count | avg
-------+--------------------
    50 | 54712.060000000000
 (1 row)

The average of the actual results is quite close to the expected threshold of 54563; the difference is less than 0.3%, quite orthodoxically, we might say!

Tags: Birthday Paradox, CTE, PostgreSQL, random, recursive
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
PostgreSQL 9.3 development: Array ELEMENT Foreign Keys The “French Revolution”
Scroll to top
×