October 18, 2022

PostgreSQL v10 vs v13

Covering Indexes

All indexes in PostgreSQL are secondary meaning indexes are stored separately from main data (rows). So, row retrieval for a query like below with table having an index on user_id gathers data like this:

sql

SELECT user_id, external_lead_id FROM leads WHERE user_id=X

Stage 1: Fetching tuple ids (or row ids) from index storage

Stage 2: Fetching external_lead_id from table storage main area (heap). This scan is also called Index Scan.

Now from Postgres 11, we can create a covering index like

sql

CREATE INDEX lead_user_idx ON leads(user_id) INCLUDE (external_lead_id);

This is Non-key or Covering index. Index will now store external_lead_id value as well. So, making above query will retrieve the data from index storage and will be a lot faster. This scan is also called Index Only Scan.

Incremental Sorting

If there is a btree index on two columns (columnA, columnB) and we issued a sort query on 3 columns (columnA, columnB, columnC), then the query planner will not use this index and will do a sequential scan. This is fixed in Postgres13 and the index will be used in later case.

Btree Index Deduplication

Btree index is used to store separate index entries for each referenced table row. This is done for faster maintenance and data modifications but the downside is it can lead to duplicate entries for same index key.

With Postgres13, Btree indexes will be deduplicated, this will reduce the overall index size. Reduced index size will save disk space. If indexes are cached, it will save RAM and hence faster retrieval. This will also affect unique indexes as with every row update, it maintains index for each table row version. So, will benefit if table row is updated frequently.

Parallel Vacuum of Indexing

PostgreSQL does vaccuming to reclaim unused space on disk. Vaccuming basically means removing older row versions of data and indexes.

With PostgreSQL13, multiple indexes on a table can be vaccumed parallelly with a setting max_parallel_maintenance_workers.

This will greatly improve performance when we do data migrations (bulk updates/inserts).

Performance Optimization for Aggregations

For aggregation queries like MAX, MIN, SUM, PostgreSQL uses hash aggregates (combine statistics stored in a hash table) only if the hash table fits in memory. Else, it will favour sequential scan.

With v13, PostgreSQL will still use hash aggregate even if it doesn’t fit in memory, will use the disk. This will give a performance boost for DISTINCT, MAX, MIN SQL queries.

Other updates (relevant to us)

  • In Postgres12, You can now run queries over JSON documents using JSON path expressions.
  • In Postgres 12, you can rebuild indexes without blocking writes to an index via the REINDEX CONCURRENTLY command, allowing users to avoid downtime scenarios for lengthy index rebuilds.

You can refer to https://www.postgresql.org/about/featurematrix/ for more details.


This article was originally published here

Be first to comment
Leave a reply