Similarity
The Similarity pipeline computes similarity between queries and list of text using a text classifier.
This pipeline supports both standard text classification models and zero-shot classification models. The pipeline uses the queries as labels for the input text. The results are transposed to get scores per query/label vs scores per input text.
Cross-encoder models are supported via the crossencode=True
constructor parameter. These models are loaded with a CrossEncoder pipeline that can also be instantiated directly. The CrossEncoder pipeline has the same methods and functionality as described below.
Example
The following shows a simple example using this pipeline.
from txtai.pipeline import Similarity
# Create and run pipeline
similarity = Similarity()
similarity("feel good story", [
"Maine man wins $1M from $25 lottery ticket",
"Don't sacrifice slower friends in a bear attack"
])
See the link below for a more detailed example.
Notebook | Description | |
---|---|---|
Add semantic search to Elasticsearch | Add semantic search to existing search systems |
Configuration-driven example
Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.
config.yml
# Create pipeline using lower case class name
similarity:
Run with Workflows
from txtai import Application
# Create and run pipeline with workflow
app = Application("config.yml")
app.similarity("feel good story", [
"Maine man wins $1M from $25 lottery ticket",
"Don't sacrifice slower friends in a bear attack"
])
Run with API
CONFIG=config.yml uvicorn "txtai.api:app" &
curl \
-X POST "http://localhost:8000/similarity" \
-H "Content-Type: application/json" \
-d '{"query": "feel good story", "texts": ["Maine man wins $1M from $25 lottery ticket", "Dont sacrifice slower friends in a bear attack"]}'
Methods
Python documentation for the pipeline.
__init__(path=None, quantize=False, gpu=True, model=None, dynamic=True, crossencode=False, **kwargs)
Source code in txtai/pipeline/text/similarity.py
16 17 18 19 20 21 |
|
__call__(query, texts, multilabel=True)
Computes the similarity between query and list of text. Returns a list of (id, score) sorted by highest score, where id is the index in texts.
This method supports query as a string or a list. If the input is a string, the return type is a 1D list of (id, score). If text is a list, a 2D list of (id, score) is returned with a row per string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
query
|
query text|list |
required | |
texts
|
list of text |
required | |
multilabel
|
labels are independent if True, scores are normalized to sum to 1 per text item if False, raw scores returned if None |
True
|
Returns:
Type | Description |
---|---|
list of (id, score) |
Source code in txtai/pipeline/text/similarity.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
|