Hybrid Search with SQLite: Vector + Full-Text
Posted on Sun 06 October 2024 in Programming
Simon Willison presents a fascinating approach to combining vector search and traditional full-text search in SQLite, using a technique called Reciprocal Rank Fusion (RRF).
The Core Problem
When we have vector searches (based on semantic similarity) and full-text searches (FTS), each returns scores on completely different scales:
- FTS returns relevance scores
- Vector search returns similarity distances
How to combine these results effectively?
The Solution: Reciprocal Rank Fusion
The RRF technique avoids comparing incompatible scores and instead relies on the ranking of each result within its respective search method.
SELECT
content,
1.0 / (:rrf_k + coalesce(fts_rank, 999)) * :fts_weight +
1.0 / (:rrf_k + coalesce(vec_rank, 999)) * :vec_weight as combined_rank
FROM (
-- FTS subquery with row_number()
SELECT *, row_number() OVER (ORDER BY rank) as fts_rank
FROM fts_search(:query)
) fts
FULL OUTER JOIN (
-- Vector subquery with row_number()
SELECT *, row_number() OVER (ORDER BY distance) as vec_rank
FROM vector_search(:query, :k)
) vec ON fts.id = vec.id
ORDER BY combined_rank DESC;
Advantages of the Hybrid Approach
- Flexibility: Allows adjusting weights between FTS and vector search
- Robustness: Results can appear in one or both methods
- Scalability: Does not require complex score normalization
- Simplicity: A single SQL query handles all the logic
SQLite as a Unified Platform
With extensions like sqlite-vec, SQLite becomes a powerful platform for:
- Full-text search (Native FTS5)
- Vector embedding search
- Hybrid combination of both techniques
This approach democratizes advanced search techniques, making them accessible without complex infrastructure.
Original article: Hybrid full-text search and vector search with SQLite