Skip to content

ANN

Approximate Nearest Neighbor (ANN) index configuration for storing vector embeddings.

backend

backend: faiss|hnsw|annoy|ggml|numpy|torch|pgvector|sqlite|custom

Sets the ANN backend. Defaults to faiss. Additional backends are available via the ann extras package. Set custom backends via setting this parameter to the fully resolvable class string.

Backend-specific settings are set with a corresponding configuration object having the same name as the backend (i.e. annoy, faiss, or hnsw). These are optional and set to defaults if omitted.

faiss

faiss:
    components: comma separated list of components - defaults to "IDMap,Flat" for small
                indices and "IVFx,Flat" for larger indexes where
                x = min(4 * sqrt(embeddings count), embeddings count / 39)
                automatically calculates number of IVF cells when omitted (supports "IVF,Flat")
    nprobe: search probe setting (int) - defaults to x/16 (as defined above)
            for larger indexes
    nflip: same as nprobe - only used with binary hash indexes
    quantize: store vectors with x-bit precision vs 32-bit (boolean|int)
              true sets 8-bit precision, false disables, int sets specified
              precision
    mmap: load as on-disk index (boolean) - trade query response time for a
          smaller RAM footprint, defaults to false
    sample: percent of data to use for model training (0.0 - 1.0)
            reduces indexing time for larger (>1M+ row) indexes, defaults to 1.0

Faiss supports both floating point and binary indexes. Floating point indexes are the default. Binary indexes are used when indexing scalar-quantized datasets.

See the following Faiss documentation links for more information.

Note: For macOS users, an existing bug in an upstream package restricts the number of processing threads to 1. This limitation is managed internally to prevent system crashes.

hnsw

hnsw:
    efconstruction:  ef_construction param for init_index (int) - defaults to 200
    m: M param for init_index (int) - defaults to 16
    randomseed: random-seed param for init_index (int) - defaults to 100
    efsearch: ef search param (int) - defaults to None and not set

See Hnswlib documentation for more information on these parameters.

annoy

annoy:
    ntrees: number of trees (int) - defaults to 10
    searchk: search_k search setting (int) - defaults to -1

See Annoy documentation for more information on these parameters. Note that annoy indexes can not be modified after creation, upserts/deletes and other modifications are not supported.

ggml

ggml:
    gpu: enable GPU - defaults to True
    quantize: sets the tensor quantization - defaults to F32
    querysize: query buffer size - defaults to 64

The GGML backend is a k-nearest neighbors backend. It stores tensors using GGML and GGUF. It supports GPU-enabled operations and supports quantization. GGML is the framework used by llama.cpp.

See this for a list of quantization types.

numpy

The NumPy backend is a k-nearest neighbors backend. It's designed for simplicity and works well with smaller datasets that fit into memory.

numpy:
    safetensors: stores vectors using the safetensors format
                 defaults to NumPy array storage

torch

The Torch backend is a k-nearest neighbors backend like NumPy. It supports GPU-enabled operations. It also has support for quantization which enables fitting larger arrays into GPU memory.

When quantization is enabled, vectors are always stored in safetensors. Note that macOS support for quantization is limited.

torch:
    safetensors: stores vectors using the safetensors format - defaults
                 to NumPy array storage if quantization is disabled
    quantize:
        type: quantization type (fp4, nf4, int8)
        blocksize: quantization block size parameter

pgvector

pgvector:
    url: database url connection string, alternatively can be set via
         ANN_URL environment variable
    schema: database schema to store vectors - defaults to being
            determined by the database
    table: database table to store vectors - defaults to `vectors`
    precision: vector float precision (half or full) - defaults to `full`
    efconstruction:  ef_construction param (int) - defaults to 200
    m: M param for init_index (int) - defaults to 16

The pgvector backend stores embeddings in a Postgres database. See the pgvector documentation for more information on these parameters. See the SQLAlchemy documentation for more information on how to construct url connection strings.

sqlite

sqlite:
    quantize: store vectors with x-bit precision vs 32-bit (boolean|int)
              true sets 8-bit precision, false disables, int sets specified
              precision
    table: database table to store vectors - defaults to `vectors`

The SQLite backend stores embeddings in a SQLite database using sqlite-vec. This backend supports 1-bit and 8-bit quantization at the storage level.

See this note on how to run this ANN on MacOS.