⚠️ Deprecated ⚠️

This feature is deprecated and will be removed in the future.

It is not recommended for use.

  • Import from "@langchain/community/document_loaders/web/cheerio" instead. This entrypoint will be removed in 0.3.0.

A class that extends the BaseDocumentLoader and implements the DocumentLoader interface. It represents a document loader for loading web-based documents using Cheerio.

Example

const loader = new CheerioWebBaseLoader("https:exampleurl.com");
const docs = await loader.load();
console.log({ docs });

Hierarchy (view full)

Implements

Constructors

Properties

caller: AsyncCaller
timeout: number
webPath: string
selector?: SelectorType
textDecoder?: TextDecoder

Methods

  • Extracts the text content from the loaded document using the selector and creates a Document instance with the extracted text and metadata. It returns an array of Document instances.

    Returns Promise<Document[]>

    A Promise that resolves to an array of Document instances.

  • Fetches the web document from the webPath and loads it using Cheerio. It returns a CheerioAPI instance.

    Returns Promise<CheerioAPI>

    A Promise that resolves to a CheerioAPI instance.

  • A static method that dynamically imports the Cheerio library and returns the load function. If the import fails, it throws an error.

    Returns Promise<{
        load: ((content, options?, isDocument?) => CheerioAPI);
    }>

    A Promise that resolves to an object containing the load function from the Cheerio library.

  • Fetches web documents from the given array of URLs and loads them using Cheerio. It returns an array of CheerioAPI instances.

    Parameters

    • urls: string[]

      An array of URLs to fetch and load.

    • caller: AsyncCaller
    • timeout: undefined | number
    • Optional textDecoder: TextDecoder
    • Optional options: CheerioOptions

    Returns Promise<CheerioAPI[]>

    A Promise that resolves to an array of CheerioAPI instances.