Pipeline Architecture

Neum AI pipelines are configured from:

  • Multiple source connectors, each with their own pre-processing instructions.
  • 1 Embed connector through which all the data extracted from the sources will pass
  • 1 Sink connector to which all the embeds generated will be stored to.
See the full selection of sources, embeds and sinks below

The pipeline can be thought about as the representation of an index that contains all the data from the specified sources.

Pipeline intialization

Pipeline init
from neumai.Pipelines.Pipeline import Pipeline
pipeline = Pipeline(
    id = "Pipeline identifier",
    name = "Pipeline name",
    sources = [<SourceConnector>,...],
    embed = <EmbedConnector>,
    sink = <SinkConnector>
)
If you have more than one source, ensure you design the metadata outputted by the source carefully. If the sources output different metadata properties depending on the sink this might lead to error or vectors in an index that don’t share metadata properties. This can be challening at retrieval time.

Running a pipeline

This will trigger the extraction of data from the data sources, transformation using the defined pre-processing steps and the loading of data into the vector store defined.

  • Local

  • Cloud

pipeline.run()

Search a pipeline

This will query the pipeline’s sink for documents stored in vector representation.

  • Local

  • Cloud

pipeline.search(query="Hello", number_of_results=3)