A type of document retriever that splits input documents into smaller chunks while separately storing and preserving the original documents. The small chunks are embedded, then on retrieval, the original "parent" documents are retrieved.

This strikes a balance between better targeted retrieval with small documents and the more context-rich larger documents.

Example

const retriever = new ParentDocumentRetriever({
vectorstore: new MemoryVectorStore(new OpenAIEmbeddings()),
byteStore: new InMemoryStore<Uint8Array>(),
parentSplitter: new RecursiveCharacterTextSplitter({
chunkOverlap: 0,
chunkSize: 500,
}),
childSplitter: new RecursiveCharacterTextSplitter({
chunkOverlap: 0,
chunkSize: 50,
}),
childK: 20,
parentK: 5,
});

const parentDocuments = await getDocuments();
await retriever.addDocuments(parentDocuments);
const retrievedDocs = await retriever.getRelevantDocuments("justice breyer");

Hierarchy (view full)

Constructors

Properties

childDocumentRetriever: any
docstore: BaseStoreInterface<string, Document>
documentCompressor: undefined | BaseDocumentCompressor
vectorstore: VectorStoreInterface
documentCompressorFilteringFn?: ((docs) => SubDocs)

Type declaration

childSplitter: TextSplitter
idKey: string = "doc_id"
childK?: number
parentK?: number
parentSplitter?: any

Methods

  • Adds documents to the docstore and vectorstores. If a retriever is provided, it will be used to add documents instead of the vectorstore.

    Parameters

    • docs: Document[]

      The documents to add

    • Optional config: {
          addToDocstore?: boolean;
          childDocChunkHeaderOptions?: any;
          ids?: string[];
      }
      • Optional addToDocstore?: boolean

        Boolean of whether to add documents to docstore. This can be false if and only if ids are provided. You may want to set this to False if the documents are already in the docstore and you don't want to re-add them.

      • Optional childDocChunkHeaderOptions?: any
      • Optional ids?: string[]

        Optional list of ids for documents. If provided should be the same length as the list of documents. Can provided if parent documents are already in the document store and you don't want to re-add to the docstore. If not provided, random UUIDs will be used as ids.

    Returns Promise<void>