AI & Machine Learning

Implementing Semantic Search in Next.js with Supabase pgvector

Uvin Vindula·April 15, 2024·12 min read

TL;DR

Traditional keyword search fails when users do not use your exact terminology. Semantic search fixes this by comparing meaning, not words. I use pgvector inside Supabase to build semantic search for client projects because it keeps vectors in the same database as everything else -- no extra infrastructure, no extra bill, no syncing headaches. This article walks through the entire pipeline: enabling pgvector, generating embeddings with OpenAI, storing vectors, building a match function in SQL, combining vector search with full-text search for hybrid results, exposing it through a Next.js API route, and rendering results in a React component. Every code block is TypeScript. Every pattern comes from production systems I have shipped.

What you will build: A semantic search system that handles 100K+ documents, returns results in under 200ms, and costs less than $5/month in embedding API calls for most use cases.

What Semantic Search Is

Keyword search matches characters. If a user searches "how to fix a slow website" and your content uses the phrase "performance optimization," keyword search returns nothing. The words do not match. Semantic search understands that these two phrases mean the same thing.

The mechanism behind this is vector embeddings. An embedding model takes a piece of text and converts it into an array of floating-point numbers -- typically 1536 dimensions for OpenAI's text-embedding-3-small or 3072 for text-embedding-3-large. These numbers encode the meaning of the text in a way that allows mathematical comparison. Two pieces of text that mean similar things will have vectors that are close together in high-dimensional space.

The distance between two vectors is measured using cosine similarity. A cosine similarity of 1.0 means the vectors are identical in direction. A similarity of 0.0 means they are completely unrelated. In practice, relevant results typically score between 0.7 and 0.95.

Here is what makes this powerful for search:

"JavaScript framework for building UIs" and "React library for frontend development" have high cosine similarity even though they share zero meaningful keywords
"Bank account" in a finance context and "river bank" in a geography context produce different vectors despite sharing the word "bank"
Multilingual queries work out of the box -- "buscar productos" and "search for products" map to nearby vectors

I have used this pattern for product search, documentation search, support ticket matching, and RAG pipelines. The implementation is the same every time. Let me walk through it.

pgvector in Supabase -- Setup

Supabase ships with pgvector as a built-in extension. You do not need to install anything. You enable it with a single SQL statement.

Step 1: Enable the Extension

Run this in the Supabase SQL Editor or in a migration file:

sql

-- Enable pgvector
create extension if not exists vector;

Step 2: Create the Documents Table

This table stores your content alongside its embedding vector:

sql

create table documents (
  id bigint primary key generated always as identity,
  title text not null,
  content text not null,
  url text,
  metadata jsonb default '{}',
  embedding vector(1536),
  created_at timestamptz default now(),
  updated_at timestamptz default now()
);

The vector(1536) column type is provided by pgvector. The dimension must match your embedding model -- 1536 for text-embedding-3-small, 3072 for text-embedding-3-large. I use the small model for most projects. The quality difference is marginal for search, and it halves your storage and improves query speed.

Step 3: Create the Search Index

Without an index, pgvector performs exact nearest neighbor search, which scans every row. That works fine for a few thousand documents but falls apart at scale. You need an approximate nearest neighbor (ANN) index:

sql

-- HNSW index for cosine similarity (recommended)
create index on documents
using hnsw (embedding vector_cosine_ops)
with (m = 16, ef_construction = 64);

Two index types exist:

HNSW -- Higher memory usage, faster queries, better recall. Use this for production.
IVFFlat -- Lower memory, requires training data, slightly lower recall. Use this if memory is constrained.

The m parameter controls how many connections each node has in the graph (higher = better recall, more memory). The ef_construction parameter controls build-time accuracy. These defaults work well up to about 500K documents.

Step 4: Add Full-Text Search Index

We will combine vector search with PostgreSQL full-text search later, so add a GIN index now:

sql

-- Add a tsvector column for full-text search
alter table documents add column fts tsvector
  generated always as (to_tsvector('english', coalesce(title, '') || ' ' || coalesce(content, ''))) stored;

create index on documents using gin (fts);

This creates a generated column that automatically updates whenever title or content changes. No triggers needed.

Generating Embeddings

You need an embedding model to convert text into vectors. I use OpenAI's text-embedding-3-small for most projects. It costs $0.02 per million tokens, which means embedding 100K documents of average length costs about $2.

Here is the embedding utility:

typescript

// lib/embeddings.ts
import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

const EMBEDDING_MODEL = "text-embedding-3-small";
const MAX_TOKENS = 8191;

export async function generateEmbedding(text: string): Promise<number[]> {
  const cleaned = text.replace(/\n/g, " ").trim();

  if (!cleaned) {
    throw new Error("Cannot generate embedding for empty text");
  }

  const response = await openai.embeddings.create({
    model: EMBEDDING_MODEL,
    input: cleaned,
  });

  return response.data[0].embedding;
}

export async function generateEmbeddings(
  texts: string[]
): Promise<number[][]> {
  const cleaned = texts.map((t) => t.replace(/\n/g, " ").trim());
  const nonEmpty = cleaned.filter(Boolean);

  if (nonEmpty.length === 0) {
    return [];
  }

  // OpenAI supports batch embedding -- send up to 2048 inputs at once
  const batchSize = 2048;
  const allEmbeddings: number[][] = [];

  for (let i = 0; i < nonEmpty.length; i += batchSize) {
    const batch = nonEmpty.slice(i, i + batchSize);

    const response = await openai.embeddings.create({
      model: EMBEDDING_MODEL,
      input: batch,
    });

    const embeddings = response.data
      .sort((a, b) => a.index - b.index)
      .map((d) => d.embedding);

    allEmbeddings.push(...embeddings);
  }

  return allEmbeddings;
}

Two things to note:

Batch embedding is critical for ingestion performance. Sending 2048 texts in one API call is dramatically faster than 2048 individual calls. The API supports this natively.
Text cleaning matters. Newlines, excessive whitespace, and special characters degrade embedding quality. Strip them before sending.

For production ingestion of large datasets, I wrap this in a queue with retry logic. But for most projects, the batch function above handles everything you need.

Storing Vectors

With embeddings generated, you need to store them alongside your documents. Here is the ingestion pipeline:

typescript

// lib/ingest.ts
import { createClient } from "@supabase/supabase-js";
import { generateEmbeddings } from "./embeddings";

const supabase = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_ROLE_KEY!
);

interface DocumentInput {
  title: string;
  content: string;
  url?: string;
  metadata?: Record<string, unknown>;
}

export async function ingestDocuments(
  documents: DocumentInput[]
): Promise<void> {
  const chunkSize = 100;

  for (let i = 0; i < documents.length; i += chunkSize) {
    const chunk = documents.slice(i, i + chunkSize);

    // Generate embeddings for the chunk
    const textsToEmbed = chunk.map(
      (doc) => `${doc.title}\n\n${doc.content}`
    );
    const embeddings = await generateEmbeddings(textsToEmbed);

    // Prepare rows with embeddings
    const rows = chunk.map((doc, index) => ({
      title: doc.title,
      content: doc.content,
      url: doc.url ?? null,
      metadata: doc.metadata ?? {},
      embedding: JSON.stringify(embeddings[index]),
    }));

    // Upsert into Supabase
    const { error } = await supabase.from("documents").insert(rows);

    if (error) {
      throw new Error(
        `Failed to insert documents at offset ${i}: ${error.message}`
      );
    }

    console.log(
      `Ingested ${Math.min(i + chunkSize, documents.length)}/${documents.length} documents`
    );
  }
}

I concatenate title and content before embedding because it gives the model more context. A title like "Authentication" alone is ambiguous. "Authentication\n\nHow to implement JWT refresh tokens in Next.js" produces a much more specific vector.

The chunk size of 100 balances between Supabase insert limits and embedding API batch efficiency. For initial data loads, I run this as a script. For ongoing ingestion, I trigger it from webhooks when content changes in the CMS.

Chunking Strategy

For documents longer than about 1000 tokens, you should split them into chunks before embedding. A single embedding cannot represent the full nuance of a 5000-word article. Here is a simple but effective chunker:

typescript

// lib/chunker.ts
interface Chunk {
  text: string;
  startIndex: number;
  endIndex: number;
}

export function chunkText(
  text: string,
  maxChunkSize: number = 1000,
  overlap: number = 200
): Chunk[] {
  const chunks: Chunk[] = [];
  const sentences = text.match(/[^.!?]+[.!?]+/g) ?? [text];

  let currentChunk = "";
  let startIndex = 0;
  let currentIndex = 0;

  for (const sentence of sentences) {
    if (currentChunk.length + sentence.length > maxChunkSize && currentChunk) {
      chunks.push({
        text: currentChunk.trim(),
        startIndex,
        endIndex: currentIndex,
      });

      // Overlap: keep the last portion of the current chunk
      const overlapText = currentChunk.slice(-overlap);
      startIndex = currentIndex - overlapText.length;
      currentChunk = overlapText;
    }

    currentChunk += sentence;
    currentIndex += sentence.length;
  }

  if (currentChunk.trim()) {
    chunks.push({
      text: currentChunk.trim(),
      startIndex,
      endIndex: currentIndex,
    });
  }

  return chunks;
}

The overlap parameter ensures that no idea gets split across chunk boundaries and lost. Two hundred characters of overlap is enough for most content. For technical documentation with long code blocks, I increase it to 400.

Building the Search Query

This is where pgvector earns its keep. The search function lives in PostgreSQL as an RPC, which means Supabase can call it directly and you get the full power of SQL.

sql

-- Create the semantic search function
create or replace function match_documents(
  query_embedding vector(1536),
  match_threshold float default 0.7,
  match_count int default 10
)
returns table (
  id bigint,
  title text,
  content text,
  url text,
  metadata jsonb,
  similarity float
)
language plpgsql
as $$
begin
  return query
  select
    d.id,
    d.title,
    d.content,
    d.url,
    d.metadata,
    1 - (d.embedding <=> query_embedding) as similarity
  from documents d
  where 1 - (d.embedding <=> query_embedding) > match_threshold
  order by d.embedding <=> query_embedding
  limit match_count;
end;
$$;

The <=> operator is pgvector's cosine distance operator. Cosine distance is 1 - cosine_similarity, so we subtract from 1 to get similarity. The match_threshold parameter filters out low-relevance noise -- anything below 0.7 is usually irrelevant.

Calling this from your application:

typescript

// lib/search.ts
import { createClient } from "@supabase/supabase-js";
import { generateEmbedding } from "./embeddings";

const supabase = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_ROLE_KEY!
);

interface SearchResult {
  id: number;
  title: string;
  content: string;
  url: string | null;
  metadata: Record<string, unknown>;
  similarity: number;
}

export async function semanticSearch(
  query: string,
  threshold: number = 0.7,
  limit: number = 10
): Promise<SearchResult[]> {
  const embedding = await generateEmbedding(query);

  const { data, error } = await supabase.rpc("match_documents", {
    query_embedding: JSON.stringify(embedding),
    match_threshold: threshold,
    match_count: limit,
  });

  if (error) {
    throw new Error(`Search failed: ${error.message}`);
  }

  return data as SearchResult[];
}

This is the core of the entire system. Everything else is infrastructure around these two functions.

Hybrid Search -- Combining Vector + Full Text

Pure vector search has a weakness: it can miss exact keyword matches that a user expects. If someone searches for "Error ERR_CONNECTION_REFUSED," they want documents containing that exact error code, not just documents about connection problems.

Hybrid search solves this by running both vector and full-text search, then combining the results with a weighted score. Here is the SQL function:

sql

create or replace function hybrid_search(
  query_text text,
  query_embedding vector(1536),
  match_count int default 10,
  full_text_weight float default 0.3,
  semantic_weight float default 0.7,
  rrf_k int default 50
)
returns table (
  id bigint,
  title text,
  content text,
  url text,
  metadata jsonb,
  similarity float,
  rank_score float
)
language plpgsql
as $$
begin
  return query
  with semantic_results as (
    select
      d.id,
      d.title,
      d.content,
      d.url,
      d.metadata,
      1 - (d.embedding <=> query_embedding) as similarity,
      row_number() over (order by d.embedding <=> query_embedding) as rank_ix
    from documents d
    order by d.embedding <=> query_embedding
    limit match_count * 3
  ),
  fulltext_results as (
    select
      d.id,
      d.title,
      d.content,
      d.url,
      d.metadata,
      ts_rank_cd(d.fts, websearch_to_tsquery('english', query_text)) as fts_rank,
      row_number() over (
        order by ts_rank_cd(d.fts, websearch_to_tsquery('english', query_text)) desc
      ) as rank_ix
    from documents d
    where d.fts @@ websearch_to_tsquery('english', query_text)
    limit match_count * 3
  ),
  combined as (
    select
      coalesce(s.id, f.id) as id,
      coalesce(s.title, f.title) as title,
      coalesce(s.content, f.content) as content,
      coalesce(s.url, f.url) as url,
      coalesce(s.metadata, f.metadata) as metadata,
      coalesce(s.similarity, 0) as similarity,
      (
        coalesce(semantic_weight / (rrf_k + s.rank_ix), 0.0) +
        coalesce(full_text_weight / (rrf_k + f.rank_ix), 0.0)
      ) as rank_score
    from semantic_results s
    full outer join fulltext_results f on s.id = f.id
  )
  select
    c.id,
    c.title,
    c.content,
    c.url,
    c.metadata,
    c.similarity,
    c.rank_score
  from combined c
  order by c.rank_score desc
  limit match_count;
end;
$$;

This uses Reciprocal Rank Fusion (RRF), a technique from information retrieval that combines rankings from different sources. The rrf_k parameter (default 50) controls how much to penalize lower-ranked results. The weights let you tune the balance -- I default to 70% semantic and 30% full-text because semantic search captures intent better, but full-text search catches the exact matches that users depend on.

The TypeScript wrapper:

typescript

// lib/hybrid-search.ts
import { createClient } from "@supabase/supabase-js";
import { generateEmbedding } from "./embeddings";

const supabase = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_ROLE_KEY!
);

interface HybridSearchResult {
  id: number;
  title: string;
  content: string;
  url: string | null;
  metadata: Record<string, unknown>;
  similarity: number;
  rank_score: number;
}

export async function hybridSearch(
  query: string,
  limit: number = 10
): Promise<HybridSearchResult[]> {
  const embedding = await generateEmbedding(query);

  const { data, error } = await supabase.rpc("hybrid_search", {
    query_text: query,
    query_embedding: JSON.stringify(embedding),
    match_count: limit,
    full_text_weight: 0.3,
    semantic_weight: 0.7,
    rrf_k: 50,
  });

  if (error) {
    throw new Error(`Hybrid search failed: ${error.message}`);
  }

  return data as HybridSearchResult[];
}

In my experience, hybrid search consistently outperforms pure vector search on real user queries. Users mix natural language with specific terms, and hybrid search handles both.

Building the API Route

The Next.js API route exposes hybrid search as an endpoint with input validation, rate limiting awareness, and proper error handling:

typescript

// app/api/search/route.ts
import { NextRequest, NextResponse } from "next/server";
import { hybridSearch } from "@/lib/hybrid-search";

interface SearchRequestBody {
  query: string;
  limit?: number;
}

function validateSearchRequest(
  body: unknown
): body is SearchRequestBody {
  if (typeof body !== "object" || body === null) return false;
  const obj = body as Record<string, unknown>;
  if (typeof obj.query !== "string") return false;
  if (obj.query.trim().length === 0) return false;
  if (obj.query.length > 500) return false;
  if (obj.limit !== undefined && typeof obj.limit !== "number") return false;
  if (typeof obj.limit === "number" && (obj.limit < 1 || obj.limit > 50))
    return false;
  return true;
}

export async function POST(request: NextRequest) {
  try {
    const body = await request.json();

    if (!validateSearchRequest(body)) {
      return NextResponse.json(
        {
          code: "INVALID_REQUEST",
          message:
            "Request must include a query string (1-500 chars) and optional limit (1-50)",
        },
        { status: 400 }
      );
    }

    const { query, limit = 10 } = body;

    const startTime = performance.now();
    const results = await hybridSearch(query, limit);
    const duration = Math.round(performance.now() - startTime);

    return NextResponse.json({
      results: results.map((result) => ({
        id: result.id,
        title: result.title,
        content:
          result.content.length > 300
            ? result.content.slice(0, 300) + "..."
            : result.content,
        url: result.url,
        similarity: Math.round(result.similarity * 100) / 100,
        rank_score: Math.round(result.rank_score * 10000) / 10000,
      })),
      meta: {
        query,
        count: results.length,
        duration_ms: duration,
      },
    });
  } catch (err) {
    console.error("Search error:", err);
    return NextResponse.json(
      {
        code: "SEARCH_FAILED",
        message: "An error occurred while processing your search",
      },
      { status: 500 }
    );
  }
}

A few design decisions worth calling out:

POST instead of GET because search queries can be long and GET has URL length limits. Also, we send the query in the body, which avoids logging sensitive search terms in server access logs.
Content truncation at 300 characters in the response. The frontend only needs a snippet for the results list. Full content loads when the user clicks through.
Duration tracking for monitoring. If search starts taking over 500ms, you know something needs tuning.
Structured error responses with error codes. The frontend can switch on code to show appropriate messages.

Frontend Search Component

The search component uses debouncing to avoid hammering the API on every keystroke and displays results with relevance indicators:

typescript

// components/search.tsx
"use client";

import { useState, useCallback, useRef, useEffect } from "react";

interface SearchResult {
  id: number;
  title: string;
  content: string;
  url: string | null;
  similarity: number;
  rank_score: number;
}

interface SearchResponse {
  results: SearchResult[];
  meta: {
    query: string;
    count: number;
    duration_ms: number;
  };
}

function useDebounce<T extends (...args: Parameters<T>) => void>(
  callback: T,
  delay: number
): T {
  const timeoutRef = useRef<ReturnType<typeof setTimeout> | null>(null);

  useEffect(() => {
    return () => {
      if (timeoutRef.current) clearTimeout(timeoutRef.current);
    };
  }, []);

  return useCallback(
    (...args: Parameters<T>) => {
      if (timeoutRef.current) clearTimeout(timeoutRef.current);
      timeoutRef.current = setTimeout(() => callback(...args), delay);
    },
    [callback, delay]
  ) as T;
}

export function SearchBox() {
  const [query, setQuery] = useState("");
  const [results, setResults] = useState<SearchResult[]>([]);
  const [isSearching, setIsSearching] = useState(false);
  const [meta, setMeta] = useState<SearchResponse["meta"] | null>(null);
  const abortRef = useRef<AbortController | null>(null);

  const performSearch = useCallback(async (searchQuery: string) => {
    if (searchQuery.trim().length < 3) {
      setResults([]);
      setMeta(null);
      return;
    }

    // Cancel any in-flight request
    if (abortRef.current) abortRef.current.abort();
    abortRef.current = new AbortController();

    setIsSearching(true);

    try {
      const response = await fetch("/api/search", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ query: searchQuery, limit: 10 }),
        signal: abortRef.current.signal,
      });

      if (!response.ok) {
        throw new Error(`Search failed with status ${response.status}`);
      }

      const data: SearchResponse = await response.json();
      setResults(data.results);
      setMeta(data.meta);
    } catch (err) {
      if (err instanceof Error && err.name === "AbortError") return;
      console.error("Search error:", err);
      setResults([]);
    } finally {
      setIsSearching(false);
    }
  }, []);

  const debouncedSearch = useDebounce(performSearch, 300);

  function handleInputChange(value: string) {
    setQuery(value);
    debouncedSearch(value);
  }

  function getRelevanceBadge(similarity: number): string {
    if (similarity >= 0.9) return "Exact match";
    if (similarity >= 0.8) return "Highly relevant";
    if (similarity >= 0.7) return "Relevant";
    return "Related";
  }

  return (
    <div className="w-full max-w-2xl mx-auto">
      <div className="relative">
        <input
          type="text"
          value={query}
          onChange={(e) => handleInputChange(e.target.value)}
          placeholder="Search by meaning, not just keywords..."
          className="w-full px-4 py-3 rounded-lg border border-gray-700 bg-gray-900 text-white placeholder-gray-400 focus:outline-none focus:ring-2 focus:ring-orange-500 focus:border-transparent"
          aria-label="Search"
        />
        {isSearching && (
          <div className="absolute right-3 top-1/2 -translate-y-1/2">
            <div className="h-5 w-5 animate-spin rounded-full border-2 border-orange-500 border-t-transparent" />
          </div>
        )}
      </div>

      {meta && (
        <p className="mt-2 text-sm text-gray-400">
          {meta.count} results in {meta.duration_ms}ms
        </p>
      )}

      {results.length > 0 && (
        <ul className="mt-4 space-y-3" role="list">
          {results.map((result) => (
            <li
              key={result.id}
              className="rounded-lg border border-gray-800 bg-gray-900/50 p-4 transition-colors hover:border-orange-500/50"
            >
              <div className="flex items-start justify-between gap-3">
                <div className="min-w-0 flex-1">
                  {result.url ? (
                    <a
                      href={result.url}
                      className="text-lg font-semibold text-white hover:text-orange-400 transition-colors"
                    >
                      {result.title}
                    </a>
                  ) : (
                    <h3 className="text-lg font-semibold text-white">
                      {result.title}
                    </h3>
                  )}
                  <p className="mt-1 text-sm text-gray-400 line-clamp-2">
                    {result.content}
                  </p>
                </div>
                <span className="shrink-0 rounded-full bg-orange-500/10 px-2.5 py-0.5 text-xs font-medium text-orange-400">
                  {getRelevanceBadge(result.similarity)}
                </span>
              </div>
            </li>
          ))}
        </ul>
      )}

      {query.length >= 3 && !isSearching && results.length === 0 && (
        <p className="mt-4 text-center text-gray-400">
          No results found. Try rephrasing your search.
        </p>
      )}
    </div>
  );
}

Key implementation details:

AbortController cancels in-flight requests when the user types a new character. Without this, stale results from slow requests can overwrite fresh results from fast ones.
300ms debounce balances responsiveness with API efficiency. Users type in bursts -- this catches the pauses between words.
Minimum 3 characters before triggering a search. Single-character and two-character queries produce noisy results and waste embedding API calls.
Relevance badges give users confidence in the results. "Exact match" vs "Related" sets the right expectation.
Accessible markup with proper ARIA labels and semantic HTML. The search input has a label. Results are in a list with the role attribute.

Performance Tuning

Once your semantic search is working, you will want to optimize it. Here are the levers I pull on every project.

Index Tuning

The HNSW index parameters directly affect speed and accuracy:

sql

-- For datasets under 100K rows
create index on documents
using hnsw (embedding vector_cosine_ops)
with (m = 16, ef_construction = 64);

-- For datasets 100K-1M rows
create index on documents
using hnsw (embedding vector_cosine_ops)
with (m = 24, ef_construction = 128);

-- Set ef_search at query time (higher = better recall, slower)
set hnsw.ef_search = 100;

The ef_search parameter controls query-time accuracy. The default of 40 works for most cases. I bump it to 100 when recall matters more than speed (e.g., RAG pipelines where missing a relevant document degrades the LLM response).

Connection Pooling

Each search query opens a database connection. Under load, you hit connection limits fast. Use Supabase's built-in connection pooler (Supavisor):

typescript

// Use the pooler URL for search queries
const supabase = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_ROLE_KEY!,
  {
    db: {
      schema: "public",
    },
  }
);

Embedding Caching

The embedding API call adds 100-300ms to every search. For repeated queries, cache the embeddings:

typescript

// lib/embedding-cache.ts
const cache = new Map<string, { embedding: number[]; timestamp: number }>();
const CACHE_TTL = 1000 * 60 * 60; // 1 hour

export async function getCachedEmbedding(
  text: string
): Promise<number[]> {
  const key = text.toLowerCase().trim();
  const cached = cache.get(key);

  if (cached && Date.now() - cached.timestamp < CACHE_TTL) {
    return cached.embedding;
  }

  const embedding = await generateEmbedding(text);
  cache.set(key, { embedding, timestamp: Date.now() });

  return embedding;
}

For production at scale, replace the in-memory Map with Redis. But for most Next.js deployments on Vercel, the in-memory cache handles the 80% case because common queries repeat within the same serverless function instance.

Reducing Vector Dimensions

OpenAI's text-embedding-3-small supports Matryoshka embeddings, meaning you can truncate the output to fewer dimensions without retraining. Halving dimensions from 1536 to 768 halves storage and speeds up queries with minimal accuracy loss:

typescript

export async function generateCompactEmbedding(
  text: string
): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: text.replace(/\n/g, " ").trim(),
    dimensions: 768, // Half the default
  });

  return response.data[0].embedding;
}

Update your table and index to vector(768) if you go this route. I use 768 dimensions for projects with over 200K documents and have seen no measurable drop in search quality.

Cost Analysis

Here is what this system actually costs in production, based on numbers from projects I have deployed.

Embedding Costs (OpenAI)

Model	Cost per 1M Tokens	10K Documents	100K Documents
text-embedding-3-small	$0.02	$0.10	$1.00
text-embedding-3-large	$0.13	$0.65	$6.50

Query embeddings are negligible. Even at 10,000 searches per day, you are generating one embedding per search -- that is about $0.01/day with the small model.

Supabase Costs

Plan	Price	Storage	Includes
Free	$0/month	500 MB	Enough for ~50K documents with embeddings
Pro	$25/month	8 GB	Enough for ~500K documents with embeddings
Team	$599/month	16 GB+	For serious scale

A single 1536-dimension vector takes about 6 KB of storage. With a 500-character text content column, each document row is roughly 8 KB total. So 100K documents consume about 800 MB including the HNSW index overhead.

Total Monthly Cost for a Typical Project

Component	Cost
Supabase Pro	$25
OpenAI embeddings (initial 100K docs)	$1 (one-time)
OpenAI embeddings (10K searches/day)	$0.30
Total	~$26/month

Compare that to a dedicated vector database like Pinecone at $70/month minimum, plus the operational overhead of syncing data between two databases. The math is clear for most projects.

When pgvector Is Not Enough

Be honest about the limits. pgvector starts struggling when:

You exceed 5M+ vectors and need sub-50ms queries
You need multi-tenant isolation at the vector level with different indexes per tenant
You require real-time index updates with zero performance degradation during writes
You are doing multimodal search across text, images, and audio embeddings simultaneously

At that point, look at Pinecone or Weaviate. But I have yet to hit these limits on any client project. Most SaaS products, internal tools, and content sites live comfortably in the sub-1M document range.

Key Takeaways

Start with pgvector in Supabase. You avoid an entire service in your architecture. Your vectors live next to your relational data. You get JOINs, transactions, and RLS policies for free.

Use hybrid search from day one. Pure vector search misses exact keyword matches. Pure full-text search misses semantic connections. Combining them with Reciprocal Rank Fusion gives you the best of both approaches.

Batch your embedding API calls. OpenAI supports up to 2048 inputs per request. Use this during ingestion. It is the difference between minutes and hours.

Chunk long documents. A single embedding cannot represent a 5000-word article. Split into overlapping chunks of 800-1000 characters. Each chunk gets its own vector.

The small embedding model is fine. text-embedding-3-small at 1536 dimensions (or even 768 with Matryoshka truncation) handles the vast majority of search use cases. Save the large model for when you have benchmarks proving you need it.

Cache query embeddings. The same queries repeat. An in-memory cache eliminates the 100-300ms embedding API call for popular searches.

Monitor search latency. Include duration_ms in your API response. Set an alert at 500ms. If you cross it, check your index parameters and connection pooling first.

If you are building a product that needs intelligent search and you are already on Supabase, the answer is pgvector. The setup takes an afternoon. The results are immediate. Your users will notice the difference between "no results found" and actually finding what they meant.

I have built this pattern into several client projects. If you need semantic search, RAG, or AI-powered features built into your product, check out my services.

*Written by Uvin Vindula↗ -- Web3 and AI engineer building production systems from Sri Lanka and the UK. I write about the tools and patterns I actually use. Follow me @IAMUVIN↗ for more.*

Working on a Web3 or AI project?

Let's talk↗

Uvin Vindula

Web3 and AI engineer based in Sri Lanka and the UK. Author of The Rise of Bitcoin. Director of Blockchain and Software Solutions at Terra Labz. Founder of uvin.lk — Sri Lanka's Bitcoin education platform with 10,000+ learners.

hello@iamuvin.com uvin.lk↗LinkedIn↗