Find answers from the community

Updated 3 months ago

Hello is there a recommended way to

Hello is there a recommended way to rerun the ingestion pipeline in case of failure? 10K documents were inserted into the docstore but there was a failure during embedding and now rerunning it will skip them since they will be considered duplicates.

Is the solution to delete all from docstore or is there a better way?
L
g
4 comments
if I was running that many documents, I might run them in smaller batches to help with failures

But yea, would have to delete from the docstore in this case
I see, will try to run in smaller batches. Also is it possible to exclude the metadata from being embedded, i saw that call of BaseEmbedding includes it without a way to exlcude
Metadata can be excluded for embeddings (or llm calls) at the document/node level

document.excluded_embed_metadata_keys=["key", ...]
Awesome thanks ill try this
Add a reply
Sign up and join the conversation on Discord