Skip to content

HFOnnx

pipeline pipeline

Exports a Hugging Face Transformer model to ONNX. Currently, this works best with classification/pooling/qa models. Work is ongoing for sequence to sequence models (summarization, transcription, translation).

Example

The following shows a simple example using this pipeline.

from txtai.pipeline import HFOnnx, Labels

# Model path
path = "distilbert-base-uncased-finetuned-sst-2-english"

# Export model to ONNX
onnx = HFOnnx()
model = onnx(path, "text-classification", "model.onnx", True)

# Run inference and validate
labels = Labels((model, path), dynamic=False)
labels("I am happy")

See the link below for a more detailed example.

Notebook Description
Export and run models with ONNX Export models with ONNX, run natively in JavaScript, Java and Rust Open In Colab

Methods

Python documentation for the pipeline.

Exports a Hugging Face Transformer model to ONNX.

Parameters:

Name Type Description Default
path

path to model, accepts Hugging Face model hub id, local path or (model, tokenizer) tuple

required
task

optional model task or category, determines the model type and outputs, defaults to export hidden state

'default'
output

optional output model path, defaults to return byte array if None

None
quantize

if model should be quantized (requires onnx to be installed), defaults to False

False
opset

onnx opset, defaults to 12

12

Returns:

Type Description

path to model output or model as bytes depending on output parameter

Source code in txtai/pipeline/train/hfonnx.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
def __call__(self, path, task="default", output=None, quantize=False, opset=12):
    """
    Exports a Hugging Face Transformer model to ONNX.

    Args:
        path: path to model, accepts Hugging Face model hub id, local path or (model, tokenizer) tuple
        task: optional model task or category, determines the model type and outputs, defaults to export hidden state
        output: optional output model path, defaults to return byte array if None
        quantize: if model should be quantized (requires onnx to be installed), defaults to False
        opset: onnx opset, defaults to 12

    Returns:
        path to model output or model as bytes depending on output parameter
    """

    inputs, outputs, model = self.parameters(task)

    if isinstance(path, (list, tuple)):
        model, tokenizer = path
        model = model.cpu()
    else:
        model = model(path)
        tokenizer = AutoTokenizer.from_pretrained(path)

    # Generate dummy inputs
    dummy = dict(tokenizer(["test inputs"], return_tensors="pt"))

    # Default to BytesIO if no output file provided
    output = output if output else BytesIO()

    # Export model to ONNX
    export(
        model,
        (dummy,),
        output,
        opset_version=opset,
        do_constant_folding=True,
        input_names=list(inputs.keys()),
        output_names=list(outputs.keys()),
        dynamic_axes=dict(chain(inputs.items(), outputs.items())),
    )

    # Quantize model
    if quantize:
        if not ONNX_RUNTIME:
            raise ImportError('onnxruntime is not available - install "pipeline" extra to enable')

        output = self.quantization(output)

    if isinstance(output, BytesIO):
        # Reset stream and return bytes
        output.seek(0)
        output = output.read()

    return output