Create your first pipeline
Website
source. This source will scrape the web contents of a site and return the HTML in the body.To configure the Data Connector
, we will specify the url
property. The connector also supports a Selector
to define what information from the connector should be used as content to embed and what content should be attached to the vector as metadata.Website
source, we will use an HTML Loader
as we are extracting HTML code and will use the Recursive Chunker
to split up the text. We will configure the Data Connector
, Loader
and Chunker
into a SourceConnector
.OpenAIEmbed
connector. This connector uses text-ada-002
, one of the most popular embedding model in the market to generate vector embeddings.Configure the connector with an OpenAI Key.WeaviateSink
connector. Weaviate is a popular open-source vector database.Configure the Weaviate connector with the connection parameters including: url
and api_key
. Other parameters are available to further configure the connector. For example, we will use class_name
to define a name for the index we are creating.Pipeline
object and then use the built-in methods to run it.run
is not intended for production scenarios. Take a look at our cloud offering where we handle large-scale parallelization, logging and monitoring for you!