Full-Text Search in PostgreSQL: Concepts & Implementation Guide
Large text fields can initially be searched using LIKE or ILIKE, but this method is not very scalable. The full-text search feature of PostgreSQL is designed to solve this same issue. It is highly suited for practical features like blog search, product catalogs, and document management systems since it processes text in a language-aware manner, recognizes word variants, and can rank results according to relevancy.
Why Not Use LIKE?
Many systems start with:
SELECT id, title, metadata, slug
FROM posts
WHERE content ILIKE '%open-source%' or title ILIKE '%open-source%';
This approach has several problems:
- ❌ Slow on large datasets
- ❌ No ranking (all results are equal)
- ❌ No linguistic understanding (plural, tense, stop words)
- ❌ Cannot use indexes efficiently
PostgreSQL Full-Text Search solves all of these.
Core Concepts in PostgreSQL Full-Text Search
1. tsvector
A tsvector represents a document in a searchable form. PostgreSQL:
- Splits text into tokens
- Normalizes words (stemming)
- Removes stop words
Example:
SELECT to_tsvector('english', 'PostgreSQL is an advanced open source database');
Output (simplified):
'advanc':3 'databas':7 'open':5 'postgresql':1 'sourc':6
2. tsquery
A tsquery represents the search query.
SELECT to_tsquery('english', 'postgresql & database');
Operators:
&AND|OR!NOT<->FOLLOWED BY
3. Matching: @@
SELECT id, title, metadata, slug
FROM posts
WHERE to_tsvector('english', content) @@ to_tsquery('english', 'open-source');
Basic Implementation Example
Step 1: Sample Table
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
title TEXT,
content TEXT,
slug TEXT,
metadata text
);
Insert sample data:
INSERT INTO posts (title, content, slug, metadata)
VALUES (
'Full-Text Search in PostgreSQL',
'PostgreSQL is a powerful open-source database.',
'full-text-search-postgresql',
'{"author":"John","tags":["PostgreSQL","Database"]}'
);
Step 2: Simple Full-Text Search Query
SELECT *
FROM posts
WHERE to_tsvector('english', title || ' ' || content || ' ' || metadata)
@@ to_tsquery('english', 'powerful');
Improving Performance with Indexes
Appropriate indexing is crucial for achieving excellent performance with PostgreSQL Full-Text Search at scale. The GIN (Generalized Inverted Index) is the most popular and advised index for this use.
What Is a GIN Index?
For data formats with several values per row, like the following, a GIN index is intended:
tsvector(full-text search)arrayjsonb
A GIN index functions more like a dictionary or inverted index than a B-tree index, which indexes complete rows:
- Each word (lexeme) becomes a key
- The index stores a list of row IDs that contain that word
This structure makes GIN extremely efficient for full-text search queries such as:
search_vector @@ tsquery
How GIN Works with Full-Text Search
Given a document:
"PostgreSQL is a powerful open source database"
PostgreSQL converts it into a tsvector:
'postgresql' 'power' 'open' 'sourc' 'databas'
The GIN index internally stores data in a structure similar to the following:
| Lexeme | Row IDs |
|---|---|
| postgresql | 1, 5, 20 |
| databas | 1, 9 |
| open | 1, 3, 7 |
When a user searches for:
plainto_tsquery('postgresql database')
PostgreSQL performs the following steps:
- Looks up each lexeme in the GIN index
- Finds the matching row ID lists
- Intersects the lists to find rows containing all terms
As a result, the full table is not scanned; just pertinent rows are returned.
Creating a GIN Index
CREATE INDEX idx_posts_fts
ON posts
USING GIN (to_tsvector('english', title || ' ' || content || ' ' || metadata));
With this index in place, PostgreSQL can perform full-text searches efficiently even on millions of rows.
Ranking Search Results
PostgreSQL can rank results by relevance using ts_rank.
SELECT id, title, metadata, slug,
ts_rank(
to_tsvector('english', title || ' ' || content || ' ' || metadata),
to_tsquery('english', 'powerful')
) AS rank
FROM posts
WHERE to_tsvector('english', title || ' ' || content || ' ' || metadata)
@@ to_tsquery('english', 'powerful')
ORDER BY rank DESC;
Using GENERATED Columns (Best Practice)
Instead of recalculating tsvector every query, store it.
ALTER TABLE posts
ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
to_tsvector('english', title || ' ' || content || ' ' || metadata)
) STORED;
Index it:
CREATE INDEX idx_posts_search_vector
ON posts USING GIN (search_vector);
Query:
SELECT id, title, metadata, slug
FROM posts
WHERE search_vector @@ to_tsquery('english', 'powerful');
Handling User Input Safely
Avoid raw to_tsquery for user input. Use plainto_tsquery.
SELECT *
FROM posts
WHERE search_vector @@ plainto_tsquery('english', 'powerful open-source');
Benefits:
- ✅ Prevents syntax errors
- ✅ Safer for user-provided text
Phrase Search
SELECT id, title, metadata, slug
FROM posts
WHERE search_vector @@ plainto_tsquery('english', 'powerful open-source');
Language Support
PostgreSQL supports multiple languages:
SELECT to_tsvector('simple', content);
SELECT to_tsvector('english', content);
SELECT to_tsvector('spanish', content);
Choose the correct language for better stemming and ranking.
Common Pitfalls
- ❌ Forgetting to use indexes
- ❌ Using
to_tsquerydirectly with user input - ❌ Recomputing
tsvectoron every query - ❌ Choosing the wrong language configuration
Full-Text Search vs External Search Engines
| Feature | PostgreSQL FTS | Elasticsearch |
|---|---|---|
| Setup | Simple | Complex |
| Performance | High (medium scale) | Very High |
| Transactions | Yes | No |
| Operational Cost | Low | High |
Rule of thumb:
- Use PostgreSQL FTS for most CRUD-based applications
- Use Elasticsearch for advanced analytics and massive scale
Best Use Cases for PostgreSQL Full-Text Search
- ✅ Blog search
- ✅ Product catalogs
- ✅ Internal admin search
- ✅ Medium-scale applications
- ❌ Heavy analytics
- ❌ Fuzzy matching at large scale
Conclusion
For many applications, PostgreSQL full-text search is more than “good enough” — it’s fast, stable, and easy to operate. Using tsvector, tsquery, GIN indexes allow you to build solid search functionality while keeping your architecture simple.
In practice, if you’re already running PostgreSQL, trying full-text search first is usually the smarter move before introducing the operational complexity of Elasticsearch.