Skip to content

Microphone

pipeline pipeline

The Microphone pipeline reads input speech from a microphone device. This pipeline is designed to run on local machines given that it requires access to read from an input device.

Example

The following shows a simple example using this pipeline.

from txtai.pipeline import Microphone

# Create and run pipeline
microphone = Microphone()
microphone()

This pipeline may require additional system dependencies. See this section for more.

See the link below for a more detailed example.

Notebook Description
Speech to Speech RAG ▶️ Full cycle speech to speech workflow with RAG Open In Colab

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.

config.yml

# Create pipeline using lower case class name
microphone:

# Run pipeline with workflow
workflow:
  microphone:
    tasks:
      - action: microphone

Run with Workflows

from txtai import Application

# Create and run pipeline with workflow
app = Application("config.yml")
list(app.workflow("microphone", ["1"]))

Run with API

CONFIG=config.yml uvicorn "txtai.api:app" &

curl \
  -X POST "http://localhost:8000/workflow" \
  -H "Content-Type: application/json" \
  -d '{"name":"microphone", "elements":["1"]}'

Methods

Python documentation for the pipeline.

__init__(rate=16000, vadmode=3, vadframe=20, vadthreshold=0.6, voicestart=300, voiceend=3400, active=5, pause=8)

Creates a new Microphone pipeline.

Parameters:

Name Type Description Default
rate

sample rate to record audio in, defaults to 16000 (16 kHz)

16000
vadmode

aggressiveness of the voice activity detector (1 - 3), defaults to 3, which is the most aggressive filter

3
vadframe

voice activity detector frame size in ms, defaults to 20

20
vadthreshold

percentage of frames (0.0 - 1.0) that must be voice to be considered speech, defaults to 0.6

0.6
voicestart

starting frequency to use for voice filtering, defaults to 300

300
voiceend

ending frequency to use for voice filtering, defaults to 3400

3400
active

minimum number of active speech chunks to require before considering this speech, defaults to 5

5
pause

number of non-speech chunks to keep before considering speech complete, defaults to 8

8
Source code in txtai/pipeline/audio/microphone.py
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
def __init__(self, rate=16000, vadmode=3, vadframe=20, vadthreshold=0.6, voicestart=300, voiceend=3400, active=5, pause=8):
    """
    Creates a new Microphone pipeline.

    Args:
        rate: sample rate to record audio in, defaults to 16000 (16 kHz)
        vadmode: aggressiveness of the voice activity detector (1 - 3), defaults to 3, which is the most aggressive filter
        vadframe: voice activity detector frame size in ms, defaults to 20
        vadthreshold: percentage of frames (0.0 - 1.0) that must be voice to be considered speech, defaults to 0.6
        voicestart: starting frequency to use for voice filtering, defaults to 300
        voiceend: ending frequency to use for voice filtering, defaults to 3400
        active: minimum number of active speech chunks to require before considering this speech, defaults to 5
        pause: number of non-speech chunks to keep before considering speech complete, defaults to 8
    """

    if not MICROPHONE:
        raise ImportError(
            (
                'Microphone pipeline is not available - install "pipeline" extra to enable. '
                "Also check that the portaudio system library is available."
            )
        )

    # Sample rate
    self.rate = rate

    # Voice activity detector
    self.vad = webrtcvad.Vad(vadmode)
    self.vadframe = vadframe
    self.vadthreshold = vadthreshold

    # Voice spectrum
    self.voicestart = voicestart
    self.voiceend = voiceend

    # Audio chunks counts
    self.active = active
    self.pause = pause

__call__(device=None)

Reads audio from an input device.

Parameters:

Name Type Description Default
device

optional input device id, otherwise uses system default

None

Returns:

Type Description

list of (audio, sample rate)

Source code in txtai/pipeline/audio/microphone.py
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
def __call__(self, device=None):
    """
    Reads audio from an input device.

    Args:
        device: optional input device id, otherwise uses system default

    Returns:
        list of (audio, sample rate)
    """

    # Listen for audio
    audio = self.listen(device[0] if isinstance(device, list) else device)

    # Return single element if single element passed in
    return (audio, self.rate) if device is None or not isinstance(device, list) else [(audio, self.rate)]