LLM
The LLM pipeline runs prompts through a large language model (LLM). This pipeline autodetects the LLM framework based on the model path.
Example
The following shows a simple example using this pipeline.
from txtai import LLM
# Create LLM pipeline
llm = LLM()
# Run prompt
llm(
"""
Answer the following question using the provided context.
Question:
What are the applications of txtai?
Context:
txtai is an open-source platform for semantic search and
workflows powered by language models.
"""
)
# Instruction tuned models typically require string prompts to
# follow a specific chat template set by the model
llm(
"""
<|im_start|>system
You are a friendly assistant.<|im_end|>
<|im_start|>user
Answer the following question...<|im_end|>
<|im_start|>assistant
"""
)
# Chat messages automatically handle templating
llm([
{"role": "system", "content": "You are a friendly assistant."},
{"role": "user", "content": "Answer the following question..."}
])
# Set the default role to user and string inputs are converted to chat messages
llm("Answer the following question...", defaultrole="user")
The LLM pipeline automatically detects the underlying LLM framework. This can also be manually set.
Hugging Face Transformers, llama.cpp and hosted API models via LiteLLM are all supported by this pipeline.
See the LiteLLM documentation for the options available with LiteLLM models. llama.cpp models support both local and remote GGUF paths on the HF Hub.
from txtai import LLM
# Transformers
llm = LLM("meta-llama/Meta-Llama-3.1-8B-Instruct")
llm = LLM("meta-llama/Meta-Llama-3.1-8B-Instruct", method="transformers")
# llama.cpp
llm = LLM("microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf")
llm = LLM("microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf",
method="llama.cpp")
# LiteLLM
llm = LLM("ollama/llama3.1")
llm = LLM("ollama/llama3.1", method="litellm")
# Custom Ollama endpoint
llm = LLM("ollama/llama3.1", api_base="http://localhost:11434")
# Custom OpenAI-compatible endpoint
llm = LLM("openai/llama3.1", api_base="http://localhost:4000")
# LLM APIs - must also set API key via environment variable
llm = LLM("gpt-4o")
llm = LLM("claude-3-5-sonnet-20240620")
Models can be externally loaded and passed to pipelines. This is useful for models that are not yet supported by Transformers and/or need special initialization.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from txtai import LLM
# Load Phi 3.5-mini
path = "microsoft/Phi-3.5-mini-instruct"
model = AutoModelForCausalLM.from_pretrained(
path,
torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(path)
llm = LLM((model, tokenizer))
See the links below for more detailed examples.
Notebook | Description | |
---|---|---|
Prompt-driven search with LLMs | Embeddings-guided and Prompt-driven search with Large Language Models (LLMs) | |
Prompt templates and task chains | Build model prompts and connect tasks together with workflows | |
Build RAG pipelines with txtai | Guide on retrieval augmented generation including how to create citations | |
Integrate LLM frameworks | Integrate llama.cpp, LiteLLM and custom generation frameworks | |
Generate knowledge with Semantic Graphs and RAG | Knowledge exploration and discovery with Semantic Graphs and RAG | |
Build knowledge graphs with LLMs | Build knowledge graphs with LLM-driven entity extraction | |
Advanced RAG with graph path traversal | Graph path traversal to collect complex sets of data for advanced RAG | |
Advanced RAG with guided generation | Retrieval Augmented and Guided Generation | |
RAG with llama.cpp and external API services | RAG with additional vector and LLM frameworks | |
How RAG with txtai works | Create RAG processes, API services and Docker instances | |
Speech to Speech RAG ▶️ | Full cycle speech to speech workflow with RAG | |
Generative Audio | Storytelling with generative audio workflows | |
Analyzing Hugging Face Posts with Graphs and Agents | Explore a rich dataset with Graph Analysis and Agents | |
Granting autonomy to agents | Agents that iteratively solve problems as they see fit | |
Getting started with LLM APIs | Generate embeddings and run LLMs with OpenAI, Claude, Gemini, Bedrock and more |
Configuration-driven example
Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.
config.yml
# Create pipeline using lower case class name
llm:
# Run pipeline with workflow
workflow:
llm:
tasks:
- action: llm
Similar to the Python example above, the underlying Hugging Face pipeline parameters and model parameters can be set in pipeline configuration.
llm:
path: microsoft/Phi-3.5-mini-instruct
torch_dtype: torch.bfloat16
Run with Workflows
from txtai import Application
# Create and run pipeline with workflow
app = Application("config.yml")
list(app.workflow("llm", [
"""
Answer the following question using the provided context.
Question:
What are the applications of txtai?
Context:
txtai is an open-source platform for semantic search and
workflows powered by language models.
"""
]))
Run with API
CONFIG=config.yml uvicorn "txtai.api:app" &
curl \
-X POST "http://localhost:8000/workflow" \
-H "Content-Type: application/json" \
-d '{"name":"llm", "elements": ["Answer the following question..."]}'
Methods
Python documentation for the pipeline.
__init__(path=None, method=None, **kwargs)
Creates a new LLM.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
model path |
None
|
|
method
|
llm model framework, infers from path if not provided |
None
|
|
kwargs
|
model keyword arguments |
{}
|
Source code in txtai/pipeline/llm/llm.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
|
__call__(text, maxlength=512, stream=False, stop=None, defaultrole='prompt', **kwargs)
Generates text. Supports the following input formats:
- String or list of strings (instruction-tuned models must follow chat templates)
- List of dictionaries with
role
andcontent
key-values or lists of lists
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
text|list |
required | |
maxlength
|
maximum sequence length |
512
|
|
stream
|
stream response if True, defaults to False |
False
|
|
stop
|
list of stop strings, defaults to None |
None
|
|
defaultrole
|
default role to apply to text inputs (prompt for raw prompts (default) or user for user chat messages) |
'prompt'
|
|
kwargs
|
additional generation keyword arguments |
{}
|
Returns:
Type | Description |
---|---|
generated text |
Source code in txtai/pipeline/llm/llm.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
|