Find answers from the community

Home
Members
valentine-rabbit
v
valentine-rabbit
Offline, last seen 4 months ago
Joined September 25, 2024
Hi All, I'm using the ingestion pipeline with pgvector. I noticed that if I use a docstore with that, it will cache properly and not try to insert multiple entries until I restart the process. In other words, if I work in a REPL, I can re-run the pipeline as many times as I want and I will only see one entry in my DB. However if I stop it and start a new REPL, it seems to generate a new document ID. It looks like it knows the hash of the content and could do an UPSERT, but it doesn't seem to be using it. Am I going about this incorrectly somehow? I'm happy to post example code.
4 comments
v
L