Skip to content

ImageHash

pipeline pipeline

The image hash pipeline generates perceptual image hashes. These hashes can be used to detect near-duplicate images. This method is not backed by machine learning models and not intended to find conceptually similar images.

Example

The following shows a simple example using this pipeline.

from txtai.pipeline import ImageHash

# Create and run pipeline
ihash = ImageHash()
ihash("path to image file")

See the link below for a more detailed example.

Notebook Description
Near duplicate image detection Identify duplicate and near-duplicate images Open In Colab

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.

config.yml

# Create pipeline using lower case class name
imagehash:

# Run pipeline with workflow
workflow:
  imagehash:
    tasks:
      - action: imagehash

Run with Workflows

from txtai import Application

# Create and run pipeline with workflow
app = Application("config.yml")
list(app.workflow("imagehash", ["path to image file"]))

Run with API

CONFIG=config.yml uvicorn "txtai.api:app" &

curl \
  -X POST "http://localhost:8000/workflow" \
  -H "Content-Type: application/json" \
  -d '{"name":"imagehash", "elements":["path to image file"]}'

Methods

Python documentation for the pipeline.

__init__(algorithm='average', size=8, strings=True)

Creates an ImageHash pipeline.

Parameters:

Name Type Description Default
algorithm

image hashing algorithm (average, perceptual, difference, wavelet, color)

'average'
size

hash size

8
strings

outputs hex strings if True (default), otherwise the pipeline returns numpy arrays

True
Source code in txtai/pipeline/image/imagehash.py
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def __init__(self, algorithm="average", size=8, strings=True):
    """
    Creates an ImageHash pipeline.

    Args:
        algorithm: image hashing algorithm (average, perceptual, difference, wavelet, color)
        size: hash size
        strings: outputs hex strings if True (default), otherwise the pipeline returns numpy arrays
    """

    if not PIL:
        raise ImportError('ImageHash pipeline is not available - install "pipeline" extra to enable')

    self.algorithm = algorithm
    self.size = size
    self.strings = strings

__call__(images)

Generates perceptual image hashes.

Parameters:

Name Type Description Default
images

image|list

required

Returns:

Type Description

list of hashes

Source code in txtai/pipeline/image/imagehash.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
def __call__(self, images):
    """
    Generates perceptual image hashes.

    Args:
        images: image|list

    Returns:
        list of hashes
    """

    # Convert single element to list
    values = [images] if not isinstance(images, list) else images

    # Open images if file strings
    values = [Image.open(image) if isinstance(image, str) else image for image in values]

    # Convert images to hashes
    hashes = [self.ihash(image) for image in values]

    # Return single element if single element passed in
    return hashes[0] if not isinstance(images, list) else hashes