Hello everyone! Could someone please

Hello everyone! Could someone please provide some clarification on IngestionPipeline. I am noticing that when i apply multiple transformations, the original document's ID is lost after the SentenceSplitter transformation which ends up inserting new rows into the vector store since the embedding's doc id is the doc id of the nodes from the MarkdownNodeParser transformation instead of the original document.

Is the this not the intended usage? My goal is to be able to split the markdown sections into chunks after parsing to break down long sections in my document, while preserving the original document's ID.

TIA!

Plain Text

pipeline = IngestionPipeline(
    transformations=[
        MarkdownNodeParser(),
        SentenceSplitter(chunk_size=200, chunk_overlap=0),
        OpenAIEmbedding(),
    ],
    vector_store=pg_vector_store,
    docstore=docstore
)
pipeline.run(documents=documents)

Find answers from the community

Hello everyone! Could someone please