Basic website scraping. Works with most websites, but might fail with javascript heavy ones. This connector takes a given URL for a website and extracts the available text from it. Once extracted, the contents are outputted as an HTML.


Required properties:

  • url: URL for website

Available metadata

Compatible loaders:

  • AutoLoader
  • HTMLLoader


from neumai.DataConnectors import WebsiteConnector
from neumai.Shared import Selector

website_connector =  WebsiteConnector(
    url = ""
    selector = Selector(