Storage Task
The Storage Task expands a local directory or cloud storage bucket into a list of URLs to process.
Example
The following shows a simple example using this task as part of a workflow.
from txtai.workflow import StorageTask, Workflow
workflow = Workflow([StorageTask()])
workflow(["s3://path/to/bucket", "local://local/directory"])
Configuration-driven example
This task can also be created with workflow configuration.
workflow:
tasks:
- task: storage
Methods
Python documentation for the task.
__init__(action=None, select=None, unpack=True, column=None, merge='hstack', initialize=None, finalize=None, concurrency=None, onetomany=True, **kwargs)
Creates a new task. A task defines two methods, type of data it accepts and the action to execute for each data element. Action is a callable function or list of callable functions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
action
|
action(s) to execute on each data element |
None
|
|
select
|
filter(s) used to select data to process |
None
|
|
unpack
|
if data elements should be unpacked or unwrapped from (id, data, tag) tuples |
True
|
|
column
|
column index to select if element is a tuple, defaults to all |
None
|
|
merge
|
merge mode for joining multi-action outputs, defaults to hstack |
'hstack'
|
|
initialize
|
action to execute before processing |
None
|
|
finalize
|
action to execute after processing |
None
|
|
concurrency
|
sets concurrency method when execute instance available valid values: "thread" for thread-based concurrency, "process" for process-based concurrency |
None
|
|
onetomany
|
if one-to-many data transformations should be enabled, defaults to True |
True
|
|
kwargs
|
additional keyword arguments |
{}
|
Source code in txtai/workflow/task/base.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
|