Behlal

Hi guys, just a quick question here in #

Hi guys, just a quick question here in because im not sure if i should be raising this here.

I'm currently trying to insert embeddings and other data into qDrant as part of LlamaIndex's IngestionPipeline.

Plain Text

client = qdrant_client.QdrantClient(path=vector_store_path)
        client.create_collection(collection_name=VECTOR_COLLECTION_NAME,vectors_config=models.VectorParams(size=param_size,distance=models.Distance.COSINE))
        return client
# other code here...
transformations=[
                TitleExtractor(nodes=3,llm=llm,num_workers=1),
                QuestionsAnsweredExtractor(questions=3,llm=llm,num_workers=1),
                SummaryExtractor(summaries=["prev","self","next"],llm=llm,num_workers=1),
                KeywordExtractor(llm=llm,num_workers=1),
                SentenceSplitter(chunk_size=2048,chunk_overlap=512),
                # TokenTextSplitter(chunk_size=1024,chunk_overlap=256),
                HuggingFaceEmbedding(model_name=embed_model)
                ],
                vector_store=vector_store
            )
            nodes = pipeline.run(documents=docs,show_progress=True)

While everything works well, the qdrant vector store created has a .lock text file created containing

Plain Text

tmp lock file

and is not releasing the lock even after everything has finished running.(The same problem occurs in .py and .ipynb files)

Is there any way for me to get qDrant to release the lock after all the data has been inserted?

I'm unsure if this an issue with the IngestionPipeline or should i be raising this to qDrant instead?

Thanks!

1 comment

BBehlal

Just a question i have, would be glad if

Just a question i have, would be glad if anyone could clarify my doubts;

Am i correct in my understanding that the difference between a query engine and a chat engine is that the chat engine stores the chat history? If so, why would anyone use the query engine over the chat engine and forgo the chat history?

Thanks!

1 comment

BBehlal

Does anyone know how i can get the last

Does anyone know how i can get the last token of stream_chat()?
E.g i am sending the results of stream_chat() as a json over FastAPI to a react frontend, and would like to have a flag for is_done. However, when checking for the is_done flag in the StreamingChatResposne, the flag is set to True when it is NOT the last token, while i am still iterating and sending the response . I'm guessing that this is because of the lag between the time when Ollama finishes it's response and sets the flag to when i am actually checking the flag.

Is there anyway i can check for the last token/response generated?

code extracts as follows:

Plain Text

async def astreamer(response,model_used):
    try:
        for i in response.response_gen:
            if response._is_done:
                print("IS DONE!")
            else:
                print("IS NOT DONE!")
            yield json.dumps(i)
            create_json_response()
            await asyncio.sleep(.1)
    except asyncio.CancelledError as e:
        print('cancelled')

Plain Text

@app.post("/chat")
async def chat(request:Request):
  ...
  response = chat_engine_dict["engine"].stream_chat(query)
  return StreamingResponse(astreamer(response,model_used=model_used),media_type="text/event-stream")

7 comments

BBehlal

Chunking and breaking down metadata for ingestion

Is there a way to chunk or breakdown metadata into smaller chunks to save in llamaindex?

I'm having an issue where my metadata is too long for the chunk size

Plain Text

ValueError: Metadata length (379349) is longer than chunk size (2048). Consider increasing the chunk size or decreasing the size of your metadata to avoid this.

3 comments

BBehlal

Hi guys, i have a question regarding the

Hi guys, i have a question regarding the input array/tensor size in LlamaIndex with vectorstore index/storage context where i'm getting the following

Plain Text

ValueError: shapes (0,512) and (384,) not aligned: 512 (dim 1) != 384 (dim 0)

with the following code

Plain Text

# We will be using local storage instead of a host qdrant server
client = qdrant_client.QdrantClient(path="./sfa_test",)
client.create_collection(collection_name="SFA",vectors_config=models.VectorParams(size=512,distance=models.Distance.COSINE))

vector_store = QdrantVectorStore(client=client,collection_name="SFA")
storage_context = StorageContext.from_defaults(vector_store=vector_store,)

from llama_index.core import ServiceContext,Document
docs = SimpleDirectoryReader("./data/").load_data()
# docs = docs [150:160]
docs = [Document(text="Hello world"), Document(text="Hello there")]
Settings.embed_model = resolve_embed_model("local:BAAI/bge-small-en-v1.5")
Settings.llm = Ollama(model="mistral")
embed = resolve_embed_model("local:BAAI/bge-small-en-v1.5")
llm = Ollama(model="mistral")

SERVICE_CONTEXT = ServiceContext.from_defaults(embed_model=embed,llm=llm)

pipeline = IngestionPipeline(
    transformations=[
        KeywordExtractor(llm=llm),
        TokenTextSplitter(chunk_size=512,chunk_overlap=256)
    ],
    vector_store=vector_store
)
nodes = pipeline.run(documents=docs,num_workers=16,)

index = VectorStoreIndex.from_vector_store(vector_store=vector_store,embed_model=embed)

query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("Give me a random example")
print(response)

I've tested that the 512 in (0,512) seems to be from the size of models.VectorParams in the line

Plain Text

client.create_collection(collection_name="SFA",vectors_config=models.VectorParams(size=512,distance=models.Distance.COSINE))

but where is the 384 in (382,) coming from?

2 comments

Find answers from the community

Hi guys, just a quick question here in #

Just a question i have, would be glad if

Does anyone know how i can get the last

Chunking and breaking down metadata for ingestion

Hi guys, i have a question regarding the