Database Indexing

Pramod Shehan
4 min readMar 13, 2024

Heap

  • Where we store all the rows in the table.
  • Data are Unordered.
  • When inserting new records, new tuple goes wherever. Usually at the end.

Pages(Blocks)

  • Heap is divided into 8 KB blocks.
  • 8 KB is the default size of the block.

Item Pointers

  • In each page, there are slots.
  • Some of the slots can be empty.
  • If you want to know the physical location of the tuple you can alway address with the block number & slot number.
  • You can check the physical location of each tuples.
SELECT CTID, * FROM t_users;
  • This CTID column is hidden system column.

Full Table Scan

  • Start at the beginning and scan all the way to at the end.

What is an index?

  • Indexes are a common way to enhance database performance.
  • An index allows the database server to find and retrieve specific rows much faster than it could do without an index.
  • Every primary key has an index by default.
  • Key value stores.
  • Keys are the list of values in the field that we have asked to index.
  • Values are the page number and item pointer(slot number) that we mentioned before.(Physical location of that tuple in the heap)
  • It is also sorted list. It’s sorts the values so it knows how to find them efficiently in this case.
  • Default index is B-Tree index in Postgres.
  • There is no visibility information in indexes whether that tuple is dead or updated. Indexes don’t care about those information
  • Visibility information is stored in heap and after you have looked up a row from the index. Then we check the visibility in the heap.
  • UPDATE — inserts new index tuple.
before the update
update t_posts set title = 'test' where user_id = 1;
after the update
  • Dead tuples are removed by Vaccum. There is no operation to delete one tuple from the index. Vaccums scan through every index completely and delete all the dead tuples.

Example

  • Create two tables and insert data.
create table t_test(
id BIGSERIAL NOT NULL PRIMARY KEY,
name text,
address text,
country_code text
);

create table t_posts(
id SERIAL PRIMARY KEY,
title TEXT,
discription TEXT,
user_id INTEGER NOT NULL REFERENCES t_test(id)
);

CREATE OR REPLACE FUNCTION insert_data_with_loop()
RETURNS void AS $$
DECLARE
i integer;
BEGIN
FOR i IN 1..1000000 LOOP
INSERT INTO t_test (name, address, country_code) values( md5(random()::text), CONCAT(array_to_string(ARRAY(SELECT chr((48 + round(random() * 9)) :: integer) FROM generate_series(1,12)), ''), ' ', array_to_string(ARRAY(SELECT chr((48 + round(random() * 9)) :: integer) FROM generate_series(1,12)), '')),substr(md5(random()::text), 0, 3));
END LOOP;
END;
$$ LANGUAGE plpgsql;


CREATE OR REPLACE FUNCTION insert_data_into_posts_with_loop()
RETURNS void AS $$
DECLARE
i integer;
BEGIN
FOR i IN 1..1000000 LOOP
INSERT INTO t_posts (title, discription, user_id) values( md5(random()::text), md5(random()::text), i);
END LOOP;
END;
$$ LANGUAGE plpgsql;

SELECT insert_data_with_loop();

SELECT insert_data_into_posts_with_loop();
  • Here there is an index. it is automatically generated by Postgres for primary key.
  • For this table also, there is an index for primary key.

Query 01

  • Check the execution plan for primary key filtering.
explain analyze select * from t_test where id = 9000000;
  • According to the execution plan, it is just scanning the index.

Query 2

explain analyze select * from t_test where address = '673138731337 072912168669';

Here according to the execution plan, it is scanning the heap sequentially.

Query 3

  • Create index on address column.

create index on t_test(address);
  • Check the execution plan.
explain analyze select * from t_test where address = '673138731337 072912168669';

Here, it is just scanning only index. there is a huge difference between Query 2 execution time and Query 3 execution time. Query 3 execution time is really fast after creating the index.

Query 4

  • Check the indexes for the specific table.
SELECT indexname AS index_name, indexdef AS index_definition
FROM pg_indexes
WHERE tablename = 'your_table_name';

--

--