Find answers from the community

Updated 2 months ago

Sorry, I'm abit confused about usage of

Sorry, I'm abit confused about usage of IngestionPipeline along with self-created Documents:

Plain Text
        embed_model = HuggingFaceEmbedding(model_name="WhereIsAI/UAE-Large-V1")
        documents = [Document(text="...", metadata={...}), Document(text=",,,", metadata={...})]

        pipeline=[
            TokenTextSplitter(
                chunk_size=512,
                chunk_overlap=20,
                separator=" "
            ),
            embed_model
        ]

      pipeline.run(documents=documents)


As of my understanding, the pipeline.run must use the embed_model to create embeddings for the documents and autopopulate them?

With the code above, I can see that my embeddings are empty in my qdrant local.

This is from qdrantUI:

Plain Text
{"id_": "04fd0f75-0641-4ff9-96df-da768e99922c", "embedding": null,
L
p
9 comments
It should be populating the embeddings? But your code seems a little odd

Should be something like this

Plain Text
pipeline = IngestionPipeline(
  transformations=[
    TokenTextSplitter(..),
    embed_model
  ],
  vector_store=vector_store,
)

pipeline.run(documents=documents)

index = VectorStoreIndex.from_vector_store(vector_store, service_context=service_context)
Yes, I forgot to add vector_store=vector_store in the code above, but it is in my original code.

One question, why do I need an index If I've already populated the vector store?
or having an index is a must?
index is basicaly an easy gateway to query engines, retrievers, chat engines
not really a way to use any of those features without an index
I see. So after I create the index, I need to store it somewhere using StorageContext and then reload it using load_index_from_storage. Is it the correct way of doing it?
@Logan M Can I just do this?

Plain Text
    query_engine = VectorStoreIndex.from_vector_store(vector_store=vector_store,
                                                      service_context=service_context).as_query_engine(
        similarity_top_k=3)

    response = query_engine.query(question)


Data storing and question answering are two different endpoints in my app.
No load_index_from_storage needed if you are using a vector db integration, just use from_vector_store()
Add a reply
Sign up and join the conversation on Discord