Find answers from the community

Updated 3 months ago

Im currently deploying celery as etl

Im currently deploying celery as etl management system, what do you think about using ingestion pipeline workers?
L
e
11 comments
I think that makes sense! I did another project where I just had workers running in EKS that pulled from rabbitmq to process, makes a lot of sense
As to your other point, assuming your loaded documents have a consistent doc_id, you can attach a docstore + vector store to the ingestion pipeline (redis, mongodb, firestore, postgres) and then manage upserts that way
Just browing your repo! Amazing work man. Only thing I need to really take care of is pulling rate to meet concurrency limits (HF inference server sets it to 512 conc requests), do you have any feedback on this end?
Have you managed this issue?
hmm, my guess is to wrap requests with tenacity, using exponential backoff
could do this with a custom transformation that wraps your embeddings call in the pipeline
Thanks again! One last question, i swear. If my documents are continuously addeded to the Vector DB as they’re processed in the ingestion pipeline, is there any way to update/refresh on a live basis also the associated index? This is needed to smoothly allow RAG over new documents too, but Im not able to find anything similar from llamaindex docs, it seems like indexing is kinda static concept atm
If you are using a remote vector store, it should be automatically synced
Really? VectorStoreIndexes connected to for instance Chromadb are auto synched? Wow!!
if its a remote server, then yes 🙂 Since the index is just an API connection
It s a docker container at the moment, think it’s kind of remote server. Cool!
Add a reply
Sign up and join the conversation on Discord