Baygon

Hi, using llm.stream_complete("blablabla

Hi, using llm.stream_complete("blablabla") in a fastapi endpoint, I properly manage to stream the response, however it seems that the line breaks are not rendered in the stream response, any idea how to manage line breaks in a streaming responmse ?

Here is the code:

response = llm.stream_complete(fmt_qa_prompt)
        async def generate_tokens():
            for r in response:
                try:
                    yield(r.delta)
                    await asyncio.sleep(.05)
                except asyncio.CancelledError as e:
                    _logger.error("Cancelled")
        return StreamingResponse(generate_tokens(), media_type="text/event-stream")

2 comments

BBaygon

Hi,

Hi,
I've built 2 simple RAG script. One in Langchain, one in Llamaindex:

Llamaindex: query_engine = index.as_query_engine()
Langchain:

    chain = load_qa_chain(llm, chain_type="stuff")
    res = chain.run(input_documents=docs, question=prompt)

Then I have a chromadb where docs have been indexed via langchain and the same embedding function.
Finally, in another script, I pass 100 questions, store context retrieved and responses, and have a custom prompt to evaluate the faithfulness of the response given question and context
I got very disturbing result with 80% faithfulness when using Langchain retriever, but only 20% when using Llamaindex.
I would assume that it could be because of the documents structure in chroma, and I'm trying to reindex everything, but the corpus is big and need to wait a day or 2 before having a replica db using llamaindex to index.
Would anyone have experienced the same and could point me in the right direction to get proper faithfulness from Llamaindex. I'm trying to migrate away from Langchain but these results do not help.

7 comments

BBaygon

would Llamaindex retrieval be as accurate if the data in chromadb have been added by Langc

would Llamaindex retrieval be as accurate if the data in chromadb have been added by Langchain?

2 comments

BBaygon

Agents

I am building an agent that is decomposing queries about about internal documentation. The agent works fine, and I am now looking at pushing this to production. All my inputs are going through a FastAPI Gunicorn instance, with Nginx in front as reverse proxy.
However I will have quite a few users and can anticipate that there will be simultaneous queries at the same time. What is the best practice to parallelize agents? Is gunicorn doing that by specifying the amount of workers?

1 comment

BBaygon

hi, anyone knows what is the api_version

hi, anyone knows what is the api_version for gpt4 turbo in Azure OpenAI?

5 comments

BBaygon

Router

using a router, I would like the router to be able to understand if it is a request for one of the retriever_tools, or if it is just a chatgpt standard query, how can I do that?
Current code:

c_tool = RetrieverTool.from_defaults(
    retriever=retriever_c,
    description=("Usefull to retrieve any information related to c"),
)

router_retriever = RouterRetriever(
    selector=LLMSingleSelector.from_defaults(service_context=service_context),
    retriever_tools=[
        a_tool,
        b_tool,
        c_tool
    ],
)

8 comments

BBaygon

And one more question I m following the

And one more question. I'm following the FAISS vectorstore documentation and manage to index documents.

However for retrieval I encounter some issues:

service_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model=base_embeddings,
    chunk_size=512
)
vector_store = FaissVectorStore.from_persist_dir('./storage')
storage_context = StorageContext.from_defaults(
    vector_store=vector_store, persist_dir='./storage'
)
index = load_index_from_storage(
    storage_context=storage_context, 
    service_context=service_context
)
query_engine = index.as_query_engine()
response = query_engine.query(query)

And I get the following error:

self._index.index_struct.nodes_dict[idx] for idx in query_result.ids
KeyError: '1'

Any idea how to fix that?

14 comments

BBaygon

Faiss

I'm storing my data in a FAISS vectorstore, and when I look at the docstore or index_store files, I see embeddings as null. Is it normal? Shouldnt they be stored in these files too?

Here is the way I generate the data:

for node in nodes:
        node_embedding = base_embeddings.get_text_embedding(
            node.get_content(metadata_mode="all")
        )
        node.embedding = node_embedding

vector_store.add(nodes)
index.storage_context.persist()

4 comments

BBaygon

Hi there,

Hi there,
I'm working with agents and it can take quite a few minutes until you get the final result. How would you typically handle that in terms of UI/UX?
The way I see it is that you could either use streaming (but then the UX is fully blocked until the result), or store intermediary results in db, show these intermediary results each time the user look back at that agent action page, and get a notification once completed to check that page.
I wonder what best practice, good ideas would be for that?

2 comments

BBaygon

SubQuestionQueryEngine: is there any way

SubQuestionQueryEngine: is there any way to stream the intermediate steps like for agents?

8 comments

BBaygon

I'm trying to implement Hybrid Search

I'm trying to implement Hybrid Search with Qdrant and I've successfully set up my collection.
Then in my code I set up the vectorstore to have hybrid = true
I'm then building the nodes manually and add them to the vectorstore then persist:

vector_store = QdrantVectorStore(index_name, client=client, enable_hybrid=True, batch_size=20)
storage_context = StorageContext.from_defaults(vector_store=vector_store) 
...
vector_store.add(nodes)
index.storage_context.persist()

Would the add method also generate both the dense and the sparse vector?

12 comments

BBaygon

Text

node_embedding = embed_model.get_text_embedding(
        node.get_content(metadata_mode="all")
)

Does this code also embed the metdata in the resulting vector?

6 comments

Find answers from the community

Hi, using llm.stream_complete("blablabla

Hi,

would Llamaindex retrieval be as accurate if the data in chromadb have been added by Langc

Agents

hi, anyone knows what is the api_version

Router

And one more question I m following the

Faiss

Hi there,

SubQuestionQueryEngine: is there any way

I'm trying to implement Hybrid Search

Text