Imagine you have an e-commerce website like Amazon.com and Alibaba.com, where there will be list of 1000’s of products, when you search for the product, it should responds immediately in spite of having millions of lists. Do you think system goes to individual rows and columns to check and responds to you, well no, If it does so, your system response time took a long time to fetch the data. So Indexing is introduce to enhance the response time of your query. In this post I will share the what is Indexing, its importance and how it can be implement in spring Boot project.
Introduction
Indexes improve query performance by allowing the database to locate data quickly without scanning entire tables. For our platform, indexing is essential for:
- When to Use:
- High-read scenarios: Dashboards, search features, reports.
- Columns in filters: Like user emails, order dates, or product categories.
- Avoid on tiny tables (under 1k rows) or columns with low uniqueness (e.g., a “gender” column with only “M/F”—most rows match, so index doesn’t help much).
Drawbacks of Database Indexing:
Although database helps to enhance the speed of your website, it may degrade the performance if not handled properly.
- Insert/Update: Every add/update means you have to update your index, like adding a extra entry to your table.
- Storage Overhead: Indexing takes (10-20)% extra disk space, depending on your database.
- Over indexing pitfall: Too much indexing slow down your system performance without helping in read operations. Use indexing where you make frequent queries, not everything.
Types of Indexes:
Single Column Index: This is the simplest indexing as it enables indexing on only one column. Use this when you want to filter based on one fields (i.e. email)
Composite Index: Use this when you want the composition of two or more fields to be indexed.
- Create queries filtering on multiple columns (e.g., WHERE location = ‘Kathmandu’ AND posted_date > ‘2025-01-01’).
- Order columns by selectivity (most selective first, e.g., location before posted_date). At first filter based on location then check for date.
- @Table(name = “products”, indexes = {
@Index(name = “idx_category”, columnList = “category”),
@Index(name = “idx_category_price”, columnList = “category, price”)
})
Unique Index: This is great for when you want do duplicate values in the indexed columns.
- Use primary keys for unique identifiers (e.g., user_id, job_id).
- Apply unique indexes for columns requiring uniqueness (e.g., email, username
- @Entity
@Table(name = “products”, indexes = {
@Index(name = “idx_category”, columnList = “category”)
})
Clustered Index: This actually determines how the data is physically stored in the table; usually it’s the primary key.
Non-clustered index: Unlike the clustered index, this doesn’t affect the physical data order.
- Index columns used in WHERE, JOIN, or ORDER BY clauses (e.g., location, posted_date).
- Prioritize high-cardinality columns (many unique values, like email) over low-cardinality ones (e.g., gender).
Test Cases:
- EXPLAIN ANALYZE SELECT * FROM users WHERE email = ‘dipesh@example.com‘;
execute the above query you can see how much does it cost to fetch the email “dipesh@example.com” you can see that it check s only one rows and the execution time is slightly low then normal query.
| database_indexing=# EXPLAIN ANALYZE SELECT * FROM users WHERE email = ‘dipesh@example.com’; QUERY PLAN ———————————————————————————————————————— Index Scan using idx_users_email on users (cost=0.29..8.30 rows=1 width=63) (actual time=0.089..0.090 rows=1 loops=1) Index Cond: ((email)::text = ‘dipesh@example.com’::text) Planning Time: 0.613 ms Execution Time: 0.120 ms (4 rows) |
How to view the query stats:
| SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5; |
This shows the top 5 slowest queries.
You’ll see columns like:
- query – the SQL
- calls – how many times it ran
- total_time – total time spent
- mean_time – average time per call
I.e. Imagine your app is slow. You run:
| SELECT * FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 1; |
And see:
| SELECT * FROM orders WHERE customer_name LIKE ‘%dipesh%’; |
That tells you:
- This query is slow
- Maybe you need an index on customer_name
- Monitor Indexes: In production, use PostgreSQL’s pg_stat_user_indexes to check index usage:
| SELECT indexrelname, idx_scan, idx_tup_read FROM pg_stat_user_indexes WHERE schemaname = ‘public’; |
Implementation:
- Only index columns used in frequent read queries (e.g., sender_id, receiver_id for Messages; user_id, job_id for Applications).
- Avoid indexing rarely queried columns (e.g., message_content, application_notes).