Find answers from the community

B
Behlal
Offline, last seen 3 months ago
Joined September 25, 2024
Hi guys, just a quick question here in because im not sure if i should be raising this here.

I'm currently trying to insert embeddings and other data into qDrant as part of LlamaIndex's IngestionPipeline.
Plain Text
client = qdrant_client.QdrantClient(path=vector_store_path)
        client.create_collection(collection_name=VECTOR_COLLECTION_NAME,vectors_config=models.VectorParams(size=param_size,distance=models.Distance.COSINE))
        return client
# other code here...
transformations=[
                TitleExtractor(nodes=3,llm=llm,num_workers=1),
                QuestionsAnsweredExtractor(questions=3,llm=llm,num_workers=1),
                SummaryExtractor(summaries=["prev","self","next"],llm=llm,num_workers=1),
                KeywordExtractor(llm=llm,num_workers=1),
                SentenceSplitter(chunk_size=2048,chunk_overlap=512),
                # TokenTextSplitter(chunk_size=1024,chunk_overlap=256),
                HuggingFaceEmbedding(model_name=embed_model)
                ],
                vector_store=vector_store
            )
            nodes = pipeline.run(documents=docs,show_progress=True)


While everything works well, the qdrant vector store created has a .lock text file created containing
Plain Text
tmp lock file

and is not releasing the lock even after everything has finished running.(The same problem occurs in .py and .ipynb files)

Is there any way for me to get qDrant to release the lock after all the data has been inserted?

I'm unsure if this an issue with the IngestionPipeline or should i be raising this to qDrant instead?

Thanks!
1 comment
L
Just a question i have, would be glad if anyone could clarify my doubts;

Am i correct in my understanding that the difference between a query engine and a chat engine is that the chat engine stores the chat history? If so, why would anyone use the query engine over the chat engine and forgo the chat history?

Thanks!
1 comment
L
Does anyone know how i can get the last token of stream_chat()?
E.g i am sending the results of stream_chat() as a json over FastAPI to a react frontend, and would like to have a flag for is_done. However, when checking for the is_done flag in the StreamingChatResposne, the flag is set to True when it is NOT the last token, while i am still iterating and sending the response . I'm guessing that this is because of the lag between the time when Ollama finishes it's response and sets the flag to when i am actually checking the flag.

Is there anyway i can check for the last token/response generated?

code extracts as follows:
Plain Text
async def astreamer(response,model_used):
    try:
        for i in response.response_gen:
            if response._is_done:
                print("IS DONE!")
            else:
                print("IS NOT DONE!")
            yield json.dumps(i)
            create_json_response()
            await asyncio.sleep(.1)
    except asyncio.CancelledError as e:
        print('cancelled')

Plain Text
@app.post("/chat")
async def chat(request:Request):
  ...
  response = chat_engine_dict["engine"].stream_chat(query)
  return StreamingResponse(astreamer(response,model_used=model_used),media_type="text/event-stream")
7 comments
L
B
Is there a way to chunk or breakdown metadata into smaller chunks to save in llamaindex?

I'm having an issue where my metadata is too long for the chunk size

Plain Text
ValueError: Metadata length (379349) is longer than chunk size (2048). Consider increasing the chunk size or decreasing the size of your metadata to avoid this.
3 comments
L
B
Hi guys, i have a question regarding the input array/tensor size in LlamaIndex with vectorstore index/storage context where i'm getting the following
Plain Text
ValueError: shapes (0,512) and (384,) not aligned: 512 (dim 1) != 384 (dim 0)

with the following code

Plain Text
# We will be using local storage instead of a host qdrant server
client = qdrant_client.QdrantClient(path="./sfa_test",)
client.create_collection(collection_name="SFA",vectors_config=models.VectorParams(size=512,distance=models.Distance.COSINE))

vector_store = QdrantVectorStore(client=client,collection_name="SFA")
storage_context = StorageContext.from_defaults(vector_store=vector_store,)

from llama_index.core import ServiceContext,Document
docs = SimpleDirectoryReader("./data/").load_data()
# docs = docs [150:160]
docs = [Document(text="Hello world"), Document(text="Hello there")]
Settings.embed_model = resolve_embed_model("local:BAAI/bge-small-en-v1.5")
Settings.llm = Ollama(model="mistral")
embed = resolve_embed_model("local:BAAI/bge-small-en-v1.5")
llm = Ollama(model="mistral")

SERVICE_CONTEXT = ServiceContext.from_defaults(embed_model=embed,llm=llm)

pipeline = IngestionPipeline(
    transformations=[
        KeywordExtractor(llm=llm),
        TokenTextSplitter(chunk_size=512,chunk_overlap=256)
    ],
    vector_store=vector_store
)
nodes = pipeline.run(documents=docs,num_workers=16,)

index = VectorStoreIndex.from_vector_store(vector_store=vector_store,embed_model=embed)

query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("Give me a random example")
print(response)


I've tested that the 512 in (0,512) seems to be from the size of models.VectorParams in the line

Plain Text
client.create_collection(collection_name="SFA",vectors_config=models.VectorParams(size=512,distance=models.Distance.COSINE))

but where is the 384 in (382,) coming from?
2 comments
B