Database Indexing

Pramod Shehan

4 min readMar 13, 2024

Heap

Where we store all the rows in the table.
Data are Unordered.
When inserting new records, new tuple goes wherever. Usually at the end.

Pages(Blocks)

Heap is divided into 8 KB blocks.
8 KB is the default size of the block.

Item Pointers

In each page, there are slots.
Some of the slots can be empty.
If you want to know the physical location of the tuple you can alway address with the block number & slot number.
You can check the physical location of each tuples.

SELECT CTID, * FROM t_users;

This CTID column is hidden system column.

Full Table Scan

Start at the beginning and scan all the way to at the end.

What is an index?

Indexes are a common way to enhance database performance.
An index allows the database server to find and retrieve specific rows much faster than it could do without an index.
Every primary key has an index by default.

Key value stores.
Keys are the list of values in the field that we have asked to index.
Values are the page number and item pointer(slot number) that we mentioned before.(Physical location of that tuple in the heap)
It is also sorted list. It’s sorts the values so it knows how to find them efficiently in this case.
Default index is B-Tree index in Postgres.
There is no visibility information in indexes whether that tuple is dead or updated. Indexes don’t care about those information
Visibility information is stored in heap and after you have looked up a row from the index. Then we check the visibility in the heap.
UPDATE — inserts new index tuple.

update t_posts set title = 'test' where user_id = 1;

Dead tuples are removed by Vaccum. There is no operation to delete one tuple from the index. Vaccums scan through every index completely and delete all the dead tuples.

Example

Create two tables and insert data.

create table t_test(
 id BIGSERIAL NOT NULL PRIMARY KEY,
 name text,
 address text,
 country_code text
);

create table t_posts(
id SERIAL PRIMARY KEY,
title TEXT,
discription TEXT,
user_id INTEGER NOT NULL REFERENCES t_test(id) 
);

CREATE OR REPLACE FUNCTION insert_data_with_loop()
RETURNS void AS $$
DECLARE
    i integer;
BEGIN
    FOR i IN 1..1000000 LOOP
INSERT INTO t_test (name, address, country_code) values( md5(random()::text), CONCAT(array_to_string(ARRAY(SELECT chr((48 + round(random() * 9)) :: integer) FROM generate_series(1,12)), ''), ' ', array_to_string(ARRAY(SELECT chr((48 + round(random() * 9)) :: integer) FROM generate_series(1,12)), '')),substr(md5(random()::text), 0, 3));
END LOOP;
END;
$$ LANGUAGE plpgsql;


CREATE OR REPLACE FUNCTION insert_data_into_posts_with_loop()
RETURNS void AS $$
DECLARE
    i integer;
BEGIN
    FOR i IN 1..1000000 LOOP
INSERT INTO t_posts (title, discription, user_id) values( md5(random()::text), md5(random()::text), i);
END LOOP;
END;
$$ LANGUAGE plpgsql;

SELECT insert_data_with_loop();

SELECT insert_data_into_posts_with_loop();

Here there is an index. it is automatically generated by Postgres for primary key.

For this table also, there is an index for primary key.

Query 01

Check the execution plan for primary key filtering.

explain analyze select * from t_test where id = 9000000;

According to the execution plan, it is just scanning the index.

Query 2

explain analyze select * from t_test where address = '673138731337 072912168669';

Here according to the execution plan, it is scanning the heap sequentially.

Query 3

Create index on address column.


create index on t_test(address);

Check the execution plan.

explain analyze select * from t_test where address = '673138731337 072912168669';

Here, it is just scanning only index. there is a huge difference between Query 2 execution time and Query 3 execution time. Query 3 execution time is really fast after creating the index.

Query 4

Check the indexes for the specific table.

SELECT indexname AS index_name, indexdef AS index_definition
FROM pg_indexes
WHERE tablename = 'your_table_name';