NashTech Blog

Full-Text Search in PostgreSQL: Concepts & Implementation Guide

Table of Contents

Full-Text Search in PostgreSQL: Concepts & Implementation Guide

Large text fields can initially be searched using LIKE or ILIKE, but this method is not very scalable. The full-text search feature of PostgreSQL is designed to solve this same issue. It is highly suited for practical features like blog search, product catalogs, and document management systems since it processes text in a language-aware manner, recognizes word variants, and can rank results according to relevancy.

Why Not Use LIKE?

Many systems start with:

SELECT id, title, metadata, slug
FROM posts 
WHERE content ILIKE '%open-source%' or title ILIKE '%open-source%';

This approach has several problems:

  • ❌ Slow on large datasets
  • ❌ No ranking (all results are equal)
  • ❌ No linguistic understanding (plural, tense, stop words)
  • ❌ Cannot use indexes efficiently

PostgreSQL Full-Text Search solves all of these.

1. tsvector

A tsvector represents a document in a searchable form. PostgreSQL:

  • Splits text into tokens
  • Normalizes words (stemming)
  • Removes stop words

Example:

SELECT to_tsvector('english', 'PostgreSQL is an advanced open source database');

Output (simplified):

'advanc':3 'databas':7 'open':5 'postgresql':1 'sourc':6

2. tsquery

A tsquery represents the search query.

SELECT to_tsquery('english', 'postgresql & database');

Operators:

  • & AND
  • | OR
  • ! NOT
  • <-> FOLLOWED BY

3. Matching: @@

SELECT id, title, metadata, slug
FROM posts
WHERE to_tsvector('english', content) @@ to_tsquery('english', 'open-source');

Basic Implementation Example

Step 1: Sample Table

CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    title TEXT,
    content TEXT,
    slug TEXT,
    metadata text
);

Insert sample data:

INSERT INTO posts (title, content, slug, metadata)
VALUES (
	'Full-Text Search in PostgreSQL', 
	'PostgreSQL is a powerful open-source database.', 
	'full-text-search-postgresql', 
	'{"author":"John","tags":["PostgreSQL","Database"]}'
);

Step 2: Simple Full-Text Search Query

SELECT *
FROM posts
WHERE to_tsvector('english', title || ' ' || content || ' ' || metadata)
      @@ to_tsquery('english', 'powerful');

Improving Performance with Indexes

Appropriate indexing is crucial for achieving excellent performance with PostgreSQL Full-Text Search at scale. The GIN (Generalized Inverted Index) is the most popular and advised index for this use.


What Is a GIN Index?

For data formats with several values per row, like the following, a GIN index is intended:

  • tsvector (full-text search)
  • array
  • jsonb

A GIN index functions more like a dictionary or inverted index than a B-tree index, which indexes complete rows:

  • Each word (lexeme) becomes a key
  • The index stores a list of row IDs that contain that word

This structure makes GIN extremely efficient for full-text search queries such as:

search_vector @@ tsquery

How GIN Works with Full-Text Search

Given a document:

"PostgreSQL is a powerful open source database"

PostgreSQL converts it into a tsvector:

'postgresql' 'power' 'open' 'sourc' 'databas'

The GIN index internally stores data in a structure similar to the following:

LexemeRow IDs
postgresql1, 5, 20
databas1, 9
open1, 3, 7

When a user searches for:

plainto_tsquery('postgresql database')

PostgreSQL performs the following steps:

  1. Looks up each lexeme in the GIN index
  2. Finds the matching row ID lists
  3. Intersects the lists to find rows containing all terms

As a result, the full table is not scanned; just pertinent rows are returned.


Creating a GIN Index

CREATE INDEX idx_posts_fts
ON posts
USING GIN (to_tsvector('english', title || ' ' || content || ' ' || metadata));

With this index in place, PostgreSQL can perform full-text searches efficiently even on millions of rows.

Ranking Search Results

PostgreSQL can rank results by relevance using ts_rank.

SELECT id, title, metadata, slug,
       ts_rank(
         to_tsvector('english', title || ' ' || content || ' ' || metadata),
         to_tsquery('english', 'powerful')
       ) AS rank
FROM posts
WHERE to_tsvector('english', title || ' ' || content || ' ' || metadata)
      @@ to_tsquery('english', 'powerful')
ORDER BY rank DESC;

Using GENERATED Columns (Best Practice)

Instead of recalculating tsvector every query, store it.

ALTER TABLE posts
ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
  to_tsvector('english', title || ' ' || content || ' ' || metadata)
) STORED;

Index it:

CREATE INDEX idx_posts_search_vector
ON posts USING GIN (search_vector);

Query:

SELECT id, title, metadata, slug
FROM posts
WHERE search_vector @@ to_tsquery('english', 'powerful');

Handling User Input Safely

Avoid raw to_tsquery for user input. Use plainto_tsquery.

SELECT *
FROM posts
WHERE search_vector @@ plainto_tsquery('english', 'powerful open-source');

Benefits:

  • ✅ Prevents syntax errors
  • ✅ Safer for user-provided text
SELECT id, title, metadata, slug
FROM posts
WHERE search_vector @@ plainto_tsquery('english', 'powerful open-source');

Language Support

PostgreSQL supports multiple languages:

SELECT to_tsvector('simple', content);
SELECT to_tsvector('english', content);
SELECT to_tsvector('spanish', content);

Choose the correct language for better stemming and ranking.

Common Pitfalls

  • ❌ Forgetting to use indexes
  • ❌ Using to_tsquery directly with user input
  • ❌ Recomputing tsvector on every query
  • ❌ Choosing the wrong language configuration

Full-Text Search vs External Search Engines

FeaturePostgreSQL FTSElasticsearch
SetupSimpleComplex
PerformanceHigh (medium scale)Very High
TransactionsYesNo
Operational CostLowHigh

Rule of thumb:

  • Use PostgreSQL FTS for most CRUD-based applications
  • Use Elasticsearch for advanced analytics and massive scale

Best Use Cases for PostgreSQL Full-Text Search

  • ✅ Blog search
  • ✅ Product catalogs
  • ✅ Internal admin search
  • ✅ Medium-scale applications
  • ❌ Heavy analytics
  • ❌ Fuzzy matching at large scale

Conclusion

For many applications, PostgreSQL full-text search is more than “good enough” — it’s fast, stable, and easy to operate. Using tsvector, tsquery, GIN indexes allow you to build solid search functionality while keeping your architecture simple.

In practice, if you’re already running PostgreSQL, trying full-text search first is usually the smarter move before introducing the operational complexity of Elasticsearch.

Picture of Dang Phan Hai

Dang Phan Hai

Leave a Comment

Your email address will not be published. Required fields are marked *

Suggested Article

Scroll to Top