Basic website scraping. Works with most websites, but might fail with javascript heavy ones. This connector takes a given URL for a website and extracts the available text from it. Once extracted, the contents are outputted as an HTML.

Properties

Required properties:

  • url: URL for website

Available metadata

  • url: URL for website

Compatible loaders:

  • AutoLoader
  • HTMLLoader

Usage

from neumai.DataConnectors import WebsiteConnector
from neumai.Shared import Selector

website_connector =  WebsiteConnector(
    url = "https://www.neum.ai/post/retrieval-augmented-generation-at-scale"
    selector = Selector(
        to_metadata=['url']
    )
)