I'm having an issue getting the

At a glance

I'm having an issue getting the ingestion pipeline getting to work with Weaviate + Redis Cache (so that I can ingest new documents later).

I had to add index=VectorStoreIndex(nodes, storage_context = storage_context) to get it to load into weaviate but now it seems that the Cache is not taken into account. Without the index line it seemed like it was processing data but never loading.

When using Chroma I was able to get the ingestion pipeline + cache to work (everything the same minus the index = line.

Plain Text

pipeline = IngestionPipeline(
        transformations=[
            SimpleNodeParser(),
#            SentenceSplitter(chunk_size=512, chunk_overlap=20),
#            TitleExtractor(nodes=5),
#            SummaryExtractor(summaries=["prev", "self"]),
#            KeywordExtractor(keywords=10),
#            OpenAIEmbedding(),
        ],
        vector_store=vector_store,
        cache=ingest_cache,
    )


    nodes = pipeline.run(documents=documents, storage_context=storage_context)

    index = VectorStoreIndex(nodes, storage_context = storage_context)

20 comments

LLogan M

I think the issue is you've commented out the embeddings

Plain Text

pipeline = IngestionPipeline(
    transformations=[
        SimpleNodeParser(),
        OpenAIEmbedding(),
    ],
    vector_store=vector_store,
    cache=ingest_cache,
)

nodes = pipeline.run(documents=documents)

This should work

ccodester

Thanks Logan. I think I tried this but will try exactly your recommendation to confirm

I was confused about letting llamaindex do the embeddings vs weaviate do the embeddings. If one was preferred over the other.

I think I recall reading that if weaviate receives embeddings that it is smart enough to not "re-embed"

LLogan M

Then you can create your index with

index = VectorStoreIndex.from_vector_store(vector_store)

LLogan M

LlamaIndex handles the embeddings, for all vector stores

LLogan M

All the vector stores assume you have embeddings attached when calling vector_store.add() under the hood

ccodester

But if I'm just loading the data into the vector store only at this point. Then I dont have to use VectorStoreIndex correct?

Only when I go to query?

ccodester

with Chroma it was loading with just pipeline.run

LLogan M

Yea you are right, it will work the same as chroma, assuming your pipeline has embeddings

LLogan M

I just put the index thing above since it gives easy access to a bunch of stuff like query engines, chat engines, etc.

ccodester

understood . Thanks for the help. Ill try again and reconfirm.

Is my assumption that weaviate will not try and to embed on its side if it receives embeddings correct?

LLogan M

yea thats right. Specifically, the code for inserting into weaviate is here. It (probably) won't work if there's no embeddings

https://github.com/run-llama/llama_index/blob/50b9f75461ec7a5baa625126d071f76ea5dc5a5d/llama_index/vector_stores/weaviate_utils.py#L141

ccodester

okay. Ill try your recommendations.
I can confirm it does work, the code up above is what I am using and it works.

Just not the caching part.

The weaviate docker set up with
ENABLE_MODULES: 'text2vec-openai,reranker-transformers, generative-openai'

ccodester

Appreciate your help Logan

LLogan M

How are you expecting the caching to work?

How it DOES work is that it caches a hash of the each transform step + input nodes. If there is a cache hit, that transform step is skipped and the cached results are used.

It will still fully run the pipeline, inserting nodes into your vector db

LLogan M

If you are trying to avoid duplicate inserts, look into attaching a docstore
https://docs.llamaindex.ai/en/stable/examples/ingestion/document_management_pipeline.html

ccodester

That makes sense . I'm still learning

LLogan M

yea no worries 🙂

ccodester

Hey Logan - Quick Q after reading about document stores.

So if I go this route is it recommended just to drop the RedisCache method? Would there be any purpose for it if I was just using it to ensure duplicate documents wer not loaded?

ccodester

ah.. I think this is an example I should follow
https://docs.llamaindex.ai/en/stable/examples/ingestion/redis_ingestion_pipeline.html

Docstore+Cache in Redis. And Vector Store Weaviate

Attachment

LLogan M

Yea the cache and docstore have slightly different use cases. But sounds like you got it 🙂

Add a reply

Find answers from the community

I'm having an issue getting the