Index format
This section documents the txtai index format. Each component is designed to ensure open access to the underlying data in a programmatic and platform independent way
If an underlying library has an index format, that is used. Otherwise, txtai persists content with MessagePack serialization.
To learn more about how these components work together, read the Index Guide and Query Guide.
ANN
Approximate Nearest Neighbor (ANN) index configuration for storing vector embeddings.
Component | Storage Format |
---|---|
Faiss | Local file format provided by library |
Hnswlib | Local file format provided by library |
Annoy | Local file format provided by library |
NumPy | Local NumPy array files via np.save / np.load |
Postgres via pgvector | Vector tables in a Postgres database |
Core
Core embeddings index files.
Component | Storage Format |
---|---|
Configuration | Embeddings index configuration stored as JSON |
Index Ids | Embeddings index ids serialized with MessagePack. Only enabled when when content storage (database) is disabled. |
Database
Databases store metadata, text and binary content.
Component | Storage Format |
---|---|
SQLite | Local database files with SQLite |
DuckDB | Local database files with DuckDB |
Postgres | Postgres relational database via SQLAlchemy. Supports additional databases via this library. |
Graph
Graph nodes and edges for an embeddings index
Component | Storage Format |
---|---|
NetworkX | Nodes and edges exported to local file serialized with MessagePack |
Postgres | Nodes and edges stored in a Postgres database. Supports additional databases. |
Scoring
Sparse/keyword indexing
Component | Storage Format |
---|---|
Local index | Metadata serialized with MessagePack. Terms stored in SQLite. |
Postgres | Text indexed with Postgres Full Text Search (FTS) |