valentine-rabbit

·

Hi All, I'm using the ingestion pipeline

Hi All, I'm using the ingestion pipeline with pgvector. I noticed that if I use a docstore with that, it will cache properly and not try to insert multiple entries until I restart the process. In other words, if I work in a REPL, I can re-run the pipeline as many times as I want and I will only see one entry in my DB. However if I stop it and start a new REPL, it seems to generate a new document ID. It looks like it knows the hash of the content and could do an UPSERT, but it doesn't seem to be using it. Am I going about this incorrectly somehow? I'm happy to post example code.

4 comments

v

L

Find answers from the community

Hi All, I'm using the ingestion pipeline