Below is a list of frequently asked questions and common issues encountered.
What models are recommended?
See the model guide.
What is the best way to track the progress of an
Wrap the list or generator passed to the index call with tqdm. See #478 for more.
What is the best way to analyze the content of a txtai index?
txtai has a console application that makes this easy. Read this article to learn more.
How can models be externally loaded and passed to embeddings and pipelines?
from transformers import AutoModel, AutoTokenizer from txtai.embeddings import Embeddings # Load model externally model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2") tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2") # Pass to embeddings instance embeddings = Embeddings(path=model, tokenizer=tokenizer)
LLM pipeline example.
import torch from transformers import AutoModelForCausalLM, AutoTokenizer from txtai.pipeline import LLM # Load Mistral-7B-OpenOrca path = "Open-Orca/Mistral-7B-OpenOrca" model = AutoModelForCausalLM.from_pretrained( path, torch_dtype=torch.bfloat16, ) tokenizer = AutoTokenizer.from_pretrained(path) llm = LLM((model, tokenizer))
Embeddings query errors like this:
SQLError: no such function: json_extract
Upgrade Python version as it doesn't have SQLite support for json_extract
Segmentation faults and similar errors on macOS
Disable OpenMP threading via the environment variable
export OMP_NUM_THREADS=1 or downgrade PyTorch to <= 1.12. See issue #377 for more.
ContextualVersionConflict exception when importing certain libraries while running one of the examples notebooks on Google Colab
Restart the kernel. See issue #409 for more on this issue.