Find answers from the community

B
Baygon
Offline, last seen 3 months ago
Joined September 25, 2024
Hi, using llm.stream_complete("blablabla") in a fastapi endpoint, I properly manage to stream the response, however it seems that the line breaks are not rendered in the stream response, any idea how to manage line breaks in a streaming responmse ?

Here is the code:
response = llm.stream_complete(fmt_qa_prompt) async def generate_tokens(): for r in response: try: yield(r.delta) await asyncio.sleep(.05) except asyncio.CancelledError as e: _logger.error("Cancelled") return StreamingResponse(generate_tokens(), media_type="text/event-stream")
2 comments
W
B
Baygon
·

Hi,

Hi,
I've built 2 simple RAG script. One in Langchain, one in Llamaindex:

Llamaindex: query_engine = index.as_query_engine()
Langchain:
chain = load_qa_chain(llm, chain_type="stuff") res = chain.run(input_documents=docs, question=prompt)

Then I have a chromadb where docs have been indexed via langchain and the same embedding function.
Finally, in another script, I pass 100 questions, store context retrieved and responses, and have a custom prompt to evaluate the faithfulness of the response given question and context
I got very disturbing result with 80% faithfulness when using Langchain retriever, but only 20% when using Llamaindex.
I would assume that it could be because of the documents structure in chroma, and I'm trying to reindex everything, but the corpus is big and need to wait a day or 2 before having a replica db using llamaindex to index.
Would anyone have experienced the same and could point me in the right direction to get proper faithfulness from Llamaindex. I'm trying to migrate away from Langchain but these results do not help.
7 comments
L
B
would Llamaindex retrieval be as accurate if the data in chromadb have been added by Langchain?
2 comments
k
B
Baygon
·

Agents

I am building an agent that is decomposing queries about about internal documentation. The agent works fine, and I am now looking at pushing this to production. All my inputs are going through a FastAPI Gunicorn instance, with Nginx in front as reverse proxy.
However I will have quite a few users and can anticipate that there will be simultaneous queries at the same time. What is the best practice to parallelize agents? Is gunicorn doing that by specifying the amount of workers?
1 comment
L
hi, anyone knows what is the api_version for gpt4 turbo in Azure OpenAI?
5 comments
L
B
B
Baygon
·

Router

using a router, I would like the router to be able to understand if it is a request for one of the retriever_tools, or if it is just a chatgpt standard query, how can I do that?
Current code:

c_tool = RetrieverTool.from_defaults( retriever=retriever_c, description=("Usefull to retrieve any information related to c"), ) router_retriever = RouterRetriever( selector=LLMSingleSelector.from_defaults(service_context=service_context), retriever_tools=[ a_tool, b_tool, c_tool ], )
8 comments
L
B
And one more question. I'm following the FAISS vectorstore documentation and manage to index documents.

However for retrieval I encounter some issues:

service_context = ServiceContext.from_defaults( llm=llm, embed_model=base_embeddings, chunk_size=512 ) vector_store = FaissVectorStore.from_persist_dir('./storage') storage_context = StorageContext.from_defaults( vector_store=vector_store, persist_dir='./storage' ) index = load_index_from_storage( storage_context=storage_context, service_context=service_context ) query_engine = index.as_query_engine() response = query_engine.query(query)

And I get the following error:

self._index.index_struct.nodes_dict[idx] for idx in query_result.ids KeyError: '1'

Any idea how to fix that?
14 comments
B
L
B
Baygon
·

Faiss

I'm storing my data in a FAISS vectorstore, and when I look at the docstore or index_store files, I see embeddings as null. Is it normal? Shouldnt they be stored in these files too?

Here is the way I generate the data:

for node in nodes: node_embedding = base_embeddings.get_text_embedding( node.get_content(metadata_mode="all") ) node.embedding = node_embedding vector_store.add(nodes) index.storage_context.persist()
4 comments
B
L
B
Baygon
·

Hi there,

Hi there,
I'm working with agents and it can take quite a few minutes until you get the final result. How would you typically handle that in terms of UI/UX?
The way I see it is that you could either use streaming (but then the UX is fully blocked until the result), or store intermediary results in db, show these intermediary results each time the user look back at that agent action page, and get a notification once completed to check that page.
I wonder what best practice, good ideas would be for that?
2 comments
B
T
SubQuestionQueryEngine: is there any way to stream the intermediate steps like for agents?
8 comments
L
B
I'm trying to implement Hybrid Search with Qdrant and I've successfully set up my collection.
Then in my code I set up the vectorstore to have hybrid = true
I'm then building the nodes manually and add them to the vectorstore then persist:

vector_store = QdrantVectorStore(index_name, client=client, enable_hybrid=True, batch_size=20) storage_context = StorageContext.from_defaults(vector_store=vector_store) ... vector_store.add(nodes) index.storage_context.persist()

Would the add method also generate both the dense and the sparse vector?
12 comments
L
B
B
Baygon
·

Text

node_embedding = embed_model.get_text_embedding( node.get_content(metadata_mode="all") )
Does this code also embed the metdata in the resulting vector?
6 comments
L
B