See below for a comprehensive series of example notebooks and applications covering txtai.
Build semantic/similarity/vector/neural search applications.
|Introducing txtai ▶️||Overview of the functionality provided by txtai|
|Build an Embeddings index with Hugging Face Datasets||Index and search Hugging Face Datasets|
|Build an Embeddings index from a data source||Index and search a data source with word embeddings|
|Add semantic search to Elasticsearch||Add semantic search to existing search systems|
|Similarity search with images||Embed images and text into the same space for search|
|Custom Embeddings SQL functions||Add user-defined functions to Embeddings SQL|
|Model explainability||Explainability for semantic search|
|Query translation||Domain-specific natural language queries with query translation|
|Build a QA database||Question matching with semantic search|
|Semantic Graphs||Explore topics, data connectivity and run network analysis|
|Topic Modeling with BM25||Topic modeling backed by a BM25 index|
Prompt-driven search, retrieval augmented generation (RAG), pipelines and workflows that interface with large language models (LLMs).
|Prompt-driven search with LLMs||Embeddings-guided and Prompt-driven search with Large Language Models (LLMs)|
|Prompt templates and task chains||Build model prompts and connect tasks together with workflows|
Transform data with language model backed pipelines.
|Extractive QA with txtai||Introduction to extractive question-answering with txtai|
|Extractive QA with Elasticsearch||Run extractive question-answering queries with Elasticsearch|
|Extractive QA to build structured data||Build structured datasets using extractive question-answering|
|Apply labels with zero shot classification||Use zero shot learning for labeling, classification and topic modeling|
|Building abstractive text summaries||Run abstractive text summarization|
|Extract text from documents||Extract text from PDF, Office, HTML and more|
|Text to speech generation||Generate speech from text|
|Transcribe audio to text||Convert audio files to text|
|Translate text between languages||Streamline machine translation and language detection|
|Generate image captions and detect objects||Captions and object detection for images|
|Near duplicate image detection||Identify duplicate and near-duplicate images|
Efficiently process data at scale.
|Run pipeline workflows ▶️||Simple yet powerful constructs to efficiently process data|
|Transform tabular data with composable workflows||Transform, index and search tabular data|
|Tensor workflows||Performant processing of large tensor arrays|
|Entity extraction workflows||Identify entity/label combinations|
|Workflow Scheduling||Schedule workflows with cron expressions|
|Push notifications with workflows||Generate and push notifications with workflows|
|Pictures are a worth a thousand words||Generate webpage summary images with DALL-E mini|
|Run txtai with native code||Execute workflows in native code with the Python C API|
Train NLP models.
|Train a text labeler||Build text sequence classification models|
|Train without labels||Use zero-shot classifiers to train new models|
|Train a QA model||Build and fine-tune question-answering models|
|Train a language model from scratch||Build new language models|
|Export and run other machine learning models||Export and run models from scikit-learn, PyTorch and more|
Run distributed txtai, integrate with the API and cloud endpoints.
|Distributed embeddings cluster||Distribute an embeddings index across multiple data nodes|
|Embeddings in the Cloud||Load and use an embeddings index from the Hugging Face Hub|
Deep dives into project architecture, data formats and performance.
|Anatomy of a txtai index||Deep dive into the file formats behind a txtai embeddings index|
|Embeddings components||Composable search with vector, SQL and scoring components|
|Customize your own embeddings database||Ways to combine vector indexes with relational databases|
|Building an efficient sparse keyword index in Python||Fast and accurate sparse keyword indexing|
|Benefits of hybrid search||Improve accuracy with a combination of semantic and keyword search|
|External database integration||Store metadata in PostgreSQL, MariaDB, MySQL and more|
New functionality added in major releases.
|What's new in txtai 4.0||Content storage, SQL, object storage, reindex and compressed indexes|
|What's new in txtai 6.0||Sparse, hybrid and subindexes for embeddings, LLM improvements|
Series of example applications with txtai. Links to hosted versions on Hugging Face Spaces are also provided, when available.
|Basic similarity search||Basic similarity search example. Data from the original txtai demo.||🤗|
|Baseball stats||Match historical baseball player stats using vector search.||🤗|
|Benchmarks||Calculate performance metrics for the BEIR datasets.||Local run only|
|Book search||Book similarity search application. Index book descriptions and query using natural language statements.||Local run only|
|Image search||Image similarity search application. Index a directory of images and run searches to identify images similar to the input query.||🤗|
|Summarize an article||Summarize an article. Workflow that extracts text from a webpage and builds a summary.||🤗|
|Wiki search||Wikipedia search application. Queries Wikipedia API and summarizes the top result.||🤗|
|Workflow builder||Build and execute txtai workflows. Connect summarization, text extraction, transcription, translation and similarity search pipelines together to run unified workflows.||🤗|