HFOnnx

pipeline

Exports a Hugging Face Transformer model to ONNX. Currently, this works best with classification/pooling/qa models. Work is ongoing for sequence to sequence models (summarization, transcription, translation).

Example

The following shows a simple example using this pipeline.

from txtai.pipeline import HFOnnx, Labels

# Model path
path = "distilbert-base-uncased-finetuned-sst-2-english"

# Export model to ONNX
onnx = HFOnnx()
model = onnx(path, "text-classification", "model.onnx", True)

# Run inference and validate
labels = Labels((model, path), dynamic=False)
labels("I am happy")

See the link below for a more detailed example.

Notebook	Description
Export and run models with ONNX	Export models with ONNX, run natively in JavaScript, Java and Rust

Methods

Python documentation for the pipeline.

`call(path, task='default', output=None, quantize=False, opset=14)`

Exports a Hugging Face Transformer model to ONNX.

Parameters:

Name	Description	Default
`path`	path to model, accepts Hugging Face model hub id, local path or (model, tokenizer) tuple	required
`task`	optional model task or category, determines the model type and outputs, defaults to export hidden state	`'default'`
`output`	optional output model path, defaults to return byte array if None	`None`
`quantize`	if model should be quantized (requires onnx to be installed), defaults to False	`False`
`opset`	onnx opset, defaults to 14	`14`

Returns:

Type	Description
	path to model output or model as bytes depending on output parameter

Source code in txtai/pipeline/train/hfonnx.py

def __call__(self, path, task="default", output=None, quantize=False, opset=14):
    """
    Exports a Hugging Face Transformer model to ONNX.

    Args:
        path: path to model, accepts Hugging Face model hub id, local path or (model, tokenizer) tuple
        task: optional model task or category, determines the model type and outputs, defaults to export hidden state
        output: optional output model path, defaults to return byte array if None
        quantize: if model should be quantized (requires onnx to be installed), defaults to False
        opset: onnx opset, defaults to 14

    Returns:
        path to model output or model as bytes depending on output parameter
    """

    inputs, outputs, model = self.parameters(task)

    if isinstance(path, (list, tuple)):
        model, tokenizer = path
        model = model.cpu()
    else:
        model = model(path)
        tokenizer = AutoTokenizer.from_pretrained(path)

    # Generate dummy inputs
    dummy = dict(tokenizer(["test inputs"], return_tensors="pt"))

    # Default to BytesIO if no output file provided
    output = output if output else BytesIO()

    # Export model to ONNX
    export(
        model,
        (dummy,),
        output,
        opset_version=opset,
        do_constant_folding=True,
        input_names=list(inputs.keys()),
        output_names=list(outputs.keys()),
        dynamic_axes=dict(chain(inputs.items(), outputs.items())),
    )

    # Quantize model
    if quantize:
        if not ONNX_RUNTIME:
            raise ImportError('onnxruntime is not available - install "pipeline" extra to enable')

        output = self.quantization(output)

    if isinstance(output, BytesIO):
        # Reset stream and return bytes
        output.seek(0)
        output = output.read()

    return output

HFOnnx

Example

Methods

__call__(path, task='default', output=None, quantize=False, opset=14)

`call(path, task='default', output=None, quantize=False, opset=14)`