Find answers from the community

Home
Members
payload
p
payload
Offline, last seen 4 weeks ago
Joined September 25, 2024
https://www.llamaindex.ai/blog/one-click-open-source-rag-observability-with-langfuse

tried to follow this article

Plain Text
    global_handler.start_trace_params(user_id=request.email_id, tags=[env.ENVIRONMENT, "support-bot"])
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'start_trace_params'
3 comments
p
L
p
payload
·

Agentic

Hey in semantic chucking implementation this video is mentioned
https://youtu.be/8OJC21T2SL4?t=1933 is Agentic chucking mentioned in the video also implemented?
1 comment
L
Hey
What is the difference between Azure ai search & azure cosmos dB
1 comment
L
and why dont llamaparse not support markdown
3 comments
p
L
i have markdown files to be vectorized current parser MarkdownReader is splitting the markdown based on headings eg (`#, code block) . I want to change the strategy of dividing the document chunk. As in my use case the document extracted doesn't have more context due to small chunks
3 comments
L
p
documents = SimpleDirectoryReader(
input_files=pdf_docs , file_extractor=file_extractor, recursive=True
).load_data()

how to add up a single file which was failed for parsing after the documents are parsed completely
2 comments
k
how to implement nemo guardrails over chat engine with streaming responses
6 comments
W
p
k
i am using QueryFusionRetriever with CondensePlusContextChatEngine, where i am having 2 retrievers BM25Retriever and VectorStoreIndex.from_vector_store and using langfuse for traces. When using condense plus context chat engine the traces are not well segrated like for multiple retriever, multiple queries and then fusion nodes. Just like well speperated as in index.as_chat_engine
6 comments
p
k
how to vectorize the documents (pdf, html) that include image ,text, tables for RAG
14 comments
p
W
k
i am setting up my rag evaluation pipeline,
here is my code

Plain Text
python 
import os
from dotenv import load_dotenv
import nest_asyncio

load_dotenv()
nest_asyncio.apply()

from llama_index.core.evaluation import RetrieverEvaluator

Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002", embed_batch_size=10)
llm = OpenAI(model="gpt-4o")

client = qdrant_client.QdrantClient(
    url=os.getenv("QDRANT_URI"), api_key=os.getenv("QDRANT_API_KEY")
)
vector_store = QdrantVectorStore(client=client, collection_name="mlofo-loan-officer-july")
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

qa_dataset = EmbeddingQAFinetuneDataset.from_json("pg_eval_dataset.json")


metrics = ["mrr", "hit_rate"]

retriever_evaluator = RetrieverEvaluator.from_metric_names(
    metrics, retriever=index.as_retriever(similarity_top_k=2)
)

sample_id, sample_query = list(qa_dataset.queries.items())[0]
sample_expected = qa_dataset.relevant_docs[sample_id]

eval_result = retriever_evaluator.evaluate(sample_query, sample_expected)
print(eval_result)

generate the dataset
Plain Text
nodes = vector_store.get_nodes()

qa_dataset = generate_question_context_pairs(
    nodes, llm=llm, num_questions_per_chunk=2
)


reference: https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/cookbooks/cohere_retriever_eval.ipynb

error

File "/home/payload/miniconda3/envs/mloflo/lib/python3.12/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py", line 184, in _aget_nodes_with_embeddings
query_result = await self._vector_store.aquery(query, **self._kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/payload/miniconda3/envs/mloflo/lib/python3.12/site-packages/llama_index/vector_stores/qdrant/base.py", line 927, in aquery
response = await self._aclient.search(
^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'search'
6 comments
p
L
k
how to use QueryFusionRetriever with CondensePlusContextChatEngine with use_async=True

Plain Text
def get_chat_engine() -> "CondensePlusContextChatEngine":

    Settings.llm = OpenAI(model="gpt-4o", temperature=0.1)
    
    index =  VectorStoreIndex.from_vector_store(vector_store=vector_store)
    retriever = index.as_retriever(similarity_top_k=3)

    retriever = QueryFusionRetriever(
        [retriever],
        similarity_top_k=4,
        num_queries=4,
        mode="reciprocal_rerank",
        use_async=True,
        verbose=True,
        query_gen_prompt=BOT_QUERY_GEN_PROMPT
    )
    chat_engine = CondensePlusContextChatEngine.from_defaults(retriever=retriever, system_prompt=SUPPORT_BOT_SYSTEM_PROMPT, streaming=True)
    return chat_engine

async def chat(request: ChatRequestBody):
    try:
       
        engine = get_chat_engine()
        response_stream = engine.stream_chat(message, chat_history=history)
        return StreamingResponse(
            stream_generator(response_stream, request.history, request.timezone),
            media_type="application/x-ndjson",
        )

    except Exception as e:
        traceback.print_exc()
        raise HTTPException(
            status_code=500, detail=f"An error occurred while processing the request. {str(e)}"
        ) from e


this the error

RuntimeError: Nested async detected. Use async functions where possible (aquery, aretrieve, arun, etc.). Otherwise, use import nest_asyncio; nest_asyncio.apply() to enable nested async or use in a jupyter notebook.
6 comments
L
p
k
Hey everyone
I have documentation and I want to find the best embedding model for RAG. How can I score / benchmark different embedding models to find the best
14 comments
k
p
secondly
what is fusion rag and how does it compare against bm25s, re ranking algorithms
how can i use fusion rag without bm25s (is it necessary to integrate bm25s
21 comments
W
p
k
hey everyone I have few questions
  • in BM25s retriever the the nodes are loaded in memory, for large documentations will this not increase the memory overheld and delay realtime response
11 comments
p
L
k
hey how can i use Hyde Query Transform with a chat engine i was unable to find implementation with chat engine
is it not possible to implement it with chat engine?

edit: if there is no chat engine implementation, can i modify query engine to include chat history

9 comments
k
p
L
is it possible to connect the notion documentation / connector with llama parse
2 comments
p
W
p
payload
·

hey everyone

hey everyone
i am using QdrantVectorStore i want to modify the metadata received / retrived from the vector store, before i send it to llm to generate the response any tips how can i do that
2 comments
i
W
p
payload
·

Chat

https://github.com/run-llama/llama_index/issues/14273#issuecomment-2181149146

hey i have the following question

  • what is the difference between OpenAI agent and ReAct agent & which to use
  • using PromptTemplates provided more controlled and consistent output compared to system prompts
  • in case of agent AzureOpenAI is very slow as compared OpenAI, there is about 10x delay in response generation. I have tried with both ReActAgent & OpenAIAgent
Plain Text
llm = AzureOpenAI(
    model=os.getenv("AOAI_COMPLETION_MODEL"),
    deployment_name=os.getenv("AOAI_DEPLOYMENT_NAME_COMPLETION"),
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    azure_endpoint=os.getenv("AOAI_ENDPOINT"),
    api_version=os.getenv("AOAI_API_VERSION"),
)

  • lastly how to use prompt template with chat engine
6 comments
p
L
E
WARNING:root:Batch upload failed 1 times. Retrying...
WARNING:root:Batch upload failed 2 times. Retrying...
WARNING:root:Batch upload failed 3 times. Retrying...


804 if "Content-Type" not in headers:
805 headers["Content-Type"] = "application/json"
--> 806 return self.apiclient.request( 807 type=m.InlineResponse2007,
813 content=body,
814 )

File ~/miniconda3/envs/mloflo/lib/python3.12/site-packages/qdrant_client/http/apiclient.py:79, in ApiClient.request(self, type, method, url, path_params, kwargs) 77 kwargs["timeout"] = int(kwargs["params"]["timeout"]) 78 request = self._client.build_request(method, url, kwargs)
---> 79 return self.send(request, type_)

File ~/miniconda3/envs/mloflo/lib/python3.12/site-packages/qdrant_client/http/apiclient.py:96, in ApiClient.send(self, request, type)
95 def send(self, request: Request, type_: Type[T]) -> T:
---> 96 response = self.middleware(request, self.send_inner)
97 if response.status_code in [200, 201, 202]:
98 try:

File ~/miniconda3/envs/mloflo/lib/python3.12/site-packages/qdrant_client/http/api_client.py:205, in BaseMiddleware.call(self, request, call_next)
204 def call(self, request: Request, call_next: Send) -> Response:
--> 205 return call_next(request)

File ~/miniconda3/envs/mloflo/lib/python3.12/site-packages/qdrant_client/http/api_client.py:108, in ApiClient.send_inner(self, request)
106 response = self._client.send(request)
107 except Exception as e:
--> 108 raise ResponseHandlingException(e)
109 return response

ResponseHandlingException: The write operation timed out

@kapa.ai
9 comments
L
p
k
https://docs.llamaindex.ai/en/stable/examples/vector_stores/qdrant_hybrid/ in this,

Plain Text
query_engine = index.as_query_engine(
    similarity_top_k=2, sparse_top_k=12, vector_store_query_mode="hybrid"
)

what kind of hybrid retrieval is being used ?
6 comments
L
p
using “Content-aware” Chunking in llamaindex over markdown & pdfs documents
2 comments
p
i am using QueryFusionRetriever with CondensePlusContextChatEngine, where i am having 2 retrievers BM25Retriever and VectorStoreIndex.from_vector_store and using langfuse for traces. When using condense plus context chat engine the traces are not well serrated like for multiple retrieverss, multiple queries and then fusion nodes. Just like well separated as in index.as_chat_engine

Plain Text
def get_chat_engine() -> "CondensePlusContextChatEngine":
    Settings.llm = OpenAI(model="gpt-4o", temperature=0.1)
    index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

    retriever = QueryFusionRetriever(
        [
            index.as_retriever(similarity_top_k=3),
            BM25Retriever.from_defaults(nodes=nodes, similarity_top_k=2, verbose=True),
        ],
        similarity_top_k=2,
        num_queries=2,
        mode="reciprocal_rerank",
        use_async=False,
        verbose=True,
        query_gen_prompt=BOT_QUERY_GEN_PROMPT,
    )
    chat_engine = CondensePlusContextChatEngine.from_defaults(
        retriever=retriever, system_prompt=SUPPORT_BOT_SYSTEM_PROMPT, streaming=True
    )
    return chat_engine
3 comments
p
L
W
how can i version my documents (notion / pdf etc. ) for RAG pipeline. lets say if there is any update in the documentation then i will have to vectorize complete data again
4 comments
W
p