Microphone
The Microphone pipeline reads input speech from a microphone device. This pipeline is designed to run on local machines given that it requires access to read from an input device.
Example
The following shows a simple example using this pipeline.
from txtai.pipeline import Microphone
# Create and run pipeline
microphone = Microphone()
microphone()
This pipeline may require additional system dependencies. See this section for more.
See the link below for a more detailed example.
Notebook | Description | |
---|---|---|
Speech to Speech RAG ▶️ | Full cycle speech to speech workflow with RAG |
Configuration-driven example
Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.
config.yml
# Create pipeline using lower case class name
microphone:
# Run pipeline with workflow
workflow:
microphone:
tasks:
- action: microphone
Run with Workflows
from txtai import Application
# Create and run pipeline with workflow
app = Application("config.yml")
list(app.workflow("microphone", ["1"]))
Run with API
CONFIG=config.yml uvicorn "txtai.api:app" &
curl \
-X POST "http://localhost:8000/workflow" \
-H "Content-Type: application/json" \
-d '{"name":"microphone", "elements":["1"]}'
Methods
Python documentation for the pipeline.
__init__(rate=16000, vadmode=3, vadframe=20, vadthreshold=0.6, voicestart=300, voiceend=3400, active=5, pause=8)
Creates a new Microphone pipeline.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
rate
|
sample rate to record audio in, defaults to 16000 (16 kHz) |
16000
|
|
vadmode
|
aggressiveness of the voice activity detector (1 - 3), defaults to 3, which is the most aggressive filter |
3
|
|
vadframe
|
voice activity detector frame size in ms, defaults to 20 |
20
|
|
vadthreshold
|
percentage of frames (0.0 - 1.0) that must be voice to be considered speech, defaults to 0.6 |
0.6
|
|
voicestart
|
starting frequency to use for voice filtering, defaults to 300 |
300
|
|
voiceend
|
ending frequency to use for voice filtering, defaults to 3400 |
3400
|
|
active
|
minimum number of active speech chunks to require before considering this speech, defaults to 5 |
5
|
|
pause
|
number of non-speech chunks to keep before considering speech complete, defaults to 8 |
8
|
Source code in txtai/pipeline/audio/microphone.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
|
__call__(device=None)
Reads audio from an input device.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device
|
optional input device id, otherwise uses system default |
None
|
Returns:
Type | Description |
---|---|
list of (audio, sample rate) |
Source code in txtai/pipeline/audio/microphone.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|