These capabilities are currently in beta. Please contact founders@tryneum.com with any questions or asks.
Overview
Configuring RAG pipelines requires iteration across different parameters ranging from pre-processing loaders and chunkers, to the actual embedding model being used. To assist in testing different configurations, Neum AI provides several tools to test, evaluate and compare pipelines.To get started with evaluation, first make sure you have a pipeline configured.
Datasets
Datasets provide the ability to create a list of test to run against a pipeline. Datasets are made up of DatasetEntry objects which each represent a test. Each DatasetEntry objects contains a query, an expected output and an id.- Cosine Evaluation: Compares the vector embeddings between the retrieved chunk and the expected output.
- LLM Evaluation: Uses an LLM to check the quality and correctness of the retrieved information in answering the query at hand. (Requires you to set an OpenAI key as an enviornment variable:
OPENAI_API_KEY
)