Skip to content

Index format

format format

This section documents the txtai index format. Each component is designed to ensure open access to the underlying data in a programmatic and platform independent way

If an underlying library has an index format, that is used. Otherwise, txtai persists content with MessagePack serialization.

To learn more about how these components work together, read the Index Guide and Query Guide.

ANN

Approximate Nearest Neighbor (ANN) index configuration for storing vector embeddings.

Component Storage Format
Faiss Local file format provided by library
Hnswlib Local file format provided by library
Annoy Local file format provided by library
NumPy Local NumPy array files via np.save / np.load
Postgres via pgvector Vector tables in a Postgres database

Core

Core embeddings index files.

Component Storage Format
Configuration Embeddings index configuration stored as JSON
Index Ids Embeddings index ids serialized with MessagePack. Only enabled when when content storage (database) is disabled.

Database

Databases store metadata, text and binary content.

Component Storage Format
SQLite Local database files with SQLite
DuckDB Local database files with DuckDB
Postgres Postgres relational database via SQLAlchemy. Supports additional databases via this library.

Graph

Graph nodes and edges for an embeddings index

Component Storage Format
NetworkX Nodes and edges exported to local file serialized with MessagePack
Postgres Nodes and edges stored in a Postgres database. Supports additional databases.

Scoring

Sparse/keyword indexing

Component Storage Format
Local index Metadata serialized with MessagePack. Terms stored in SQLite.
Postgres Text indexed with Postgres Full Text Search (FTS)