Sorry, I'm abit confused about usage of

At a glance

Sorry, I'm abit confused about usage of IngestionPipeline along with self-created Documents:

Plain Text

        embed_model = HuggingFaceEmbedding(model_name="WhereIsAI/UAE-Large-V1")
        documents = [Document(text="...", metadata={...}), Document(text=",,,", metadata={...})]

        pipeline=[
            TokenTextSplitter(
                chunk_size=512,
                chunk_overlap=20,
                separator=" "
            ),
            embed_model
        ]

      pipeline.run(documents=documents)

As of my understanding, the pipeline.run must use the embed_model to create embeddings for the documents and autopopulate them?

With the code above, I can see that my embeddings are empty in my qdrant local.

This is from qdrantUI:

Plain Text

{"id_": "04fd0f75-0641-4ff9-96df-da768e99922c", "embedding": null,

9 comments

LLogan M

It should be populating the embeddings? But your code seems a little odd

Should be something like this

Plain Text

pipeline = IngestionPipeline(
  transformations=[
    TokenTextSplitter(..),
    embed_model
  ],
  vector_store=vector_store,
)

pipeline.run(documents=documents)

index = VectorStoreIndex.from_vector_store(vector_store, service_context=service_context)

ppikachu8887867

Yes, I forgot to add vector_store=vector_store in the code above, but it is in my original code.

One question, why do I need an index If I've already populated the vector store?

ppikachu8887867

or having an index is a must?

LLogan M

index is basicaly an easy gateway to query engines, retrievers, chat engines

LLogan M

not really a way to use any of those features without an index

ppikachu8887867

I see. So after I create the index, I need to store it somewhere using StorageContext and then reload it using load_index_from_storage. Is it the correct way of doing it?

ppikachu8887867

@Logan M Can I just do this?

Plain Text

    query_engine = VectorStoreIndex.from_vector_store(vector_store=vector_store,
                                                      service_context=service_context).as_query_engine(
        similarity_top_k=3)

    response = query_engine.query(question)

Data storing and question answering are two different endpoints in my app.

LLogan M

No load_index_from_storage needed if you are using a vector db integration, just use from_vector_store()

LLogan M

Yessir

Add a reply

Find answers from the community

Sorry, I'm abit confused about usage of