Basic website scraping. Works with most websites, but might fail with javascript heavy ones. This connector takes a given URL for a website and extracts the available text from it. Once extracted, the contents are outputted as an HTML.

Properties

Required properties:
  • url: URL for website
Available metadata
  • url: URL for website
Compatible loaders:
  • AutoLoader
  • HTMLLoader

Usage

from neumai.DataConnectors import WebsiteConnector
from neumai.Shared import Selector

website_connector =  WebsiteConnector(
    url = "https://www.neum.ai/post/retrieval-augmented-generation-at-scale"
    selector = Selector(
        to_metadata=['url']
    )
)