Learn about how data is handled and trasnformed within the Neum AI pipeline
id
to uniquely identify a piece of content, the content
itself, and the metadata
associated with that content. The Neum Document is updated as the data is extracted from the data source, processed through loaders and chunked. To learn about this process in depth, see Data Pre-processing
If you have used Langchain or LlamaIndex Document
interfaces, this should be very familiar. The main difference is the addition of an id
which is a key element needed as data is ingested into the vector storage and is later updated through real-time synchronization.
id
to uniquely identify the vector, a vector
property which holds the embeddings, and metadata
associated with it. The Neum Vector gets generated out of a Neum Document when the content in the document is turned into a vector embedding. When generating the Neum Vector, the content is added into the metadata to have a single object to attach to the vector.
id
, the metadata
associated with the vector and a score
property that represents the similarity score against the given query. This interface is designed to be compatible with a wide range of vector storage systems.