payload

Building a Bot for General Queries and Platform-related Retrieval

2 comments

i want to beable to use more then 1 vector sotre in llamaindex like one vector has data re

14 comments

Building a Versatile Bot: General Queries and Platform-Specific Retrieval

@Logan M

i am building a bot which could cater to general queries and also does retrieval whenever required from the vector store for platform related queries

If the query is general,,it should be answered without retrieval
If the query is related to the use of the platform,, it should be answered using the vector
Streaming Chat
Use Chat Engine instead of query engine
And retrieval should use HyDe and hybrid RAG

i used this reference but does not seem to work https://docs.llamaindex.ai/en/stable/examples/query_engine/RetrieverRouterQueryEngine/

3 comments

ppayload

How to trace your application with Azure...

https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/trace-local-sdk?tabs=python

do we currently have azure ai tracing integration in llamaindex ?

3 comments

ppayload

Agentic

Hey in semantic chucking implementation this video is mentioned
https://youtu.be/8OJC21T2SL4?t=1933 is Agentic chucking mentioned in the video also implemented?

1 comment

ppayload

Azure ai search vs azure cosmos db: key differences

Hey
What is the difference between Azure ai search & azure cosmos dB

1 comment

ppayload

and why dont llamaparse not support

and why dont llamaparse not support markdown

3 comments

ppayload

i have markdown files to be vectorized

i have markdown files to be vectorized current parser MarkdownReader is splitting the markdown based on headings eg (`#, code block) . I want to change the strategy of dividing the document chunk. As in my use case the document extracted doesn't have more context due to small chunks

3 comments

ppayload

documents = SimpleDirectoryReader( input_files=pdf_docs , file_extractor=file_extracto

documents = SimpleDirectoryReader(
input_files=pdf_docs , file_extractor=file_extractor, recursive=True
).load_data()

how to add up a single file which was failed for parsing after the documents are parsed completely

2 comments

ppayload

how to implement nemo guardrails over chat engine with streaming responses

6 comments

ppayload

i am using `QueryFusionRetriever` with

i am using QueryFusionRetriever with CondensePlusContextChatEngine, where i am having 2 retrievers BM25Retriever and VectorStoreIndex.from_vector_store and using langfuse for traces. When using condense plus context chat engine the traces are not well segrated like for multiple retriever, multiple queries and then fusion nodes. Just like well speperated as in index.as_chat_engine

6 comments

ppayload

how to vectorize the documents (pdf, html) that include image ,text, tables for RAG

14 comments

ppayload

i am setting up my rag evaluation pipeline, here is my code ```python import osfrom d

i am setting up my rag evaluation pipeline,
here is my code

Plain Text

python 
import os
from dotenv import load_dotenv
import nest_asyncio

load_dotenv()
nest_asyncio.apply()

from llama_index.core.evaluation import RetrieverEvaluator

Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002", embed_batch_size=10)
llm = OpenAI(model="gpt-4o")

client = qdrant_client.QdrantClient(
    url=os.getenv("QDRANT_URI"), api_key=os.getenv("QDRANT_API_KEY")
)
vector_store = QdrantVectorStore(client=client, collection_name="mlofo-loan-officer-july")
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

qa_dataset = EmbeddingQAFinetuneDataset.from_json("pg_eval_dataset.json")


metrics = ["mrr", "hit_rate"]

retriever_evaluator = RetrieverEvaluator.from_metric_names(
    metrics, retriever=index.as_retriever(similarity_top_k=2)
)

sample_id, sample_query = list(qa_dataset.queries.items())[0]
sample_expected = qa_dataset.relevant_docs[sample_id]

eval_result = retriever_evaluator.evaluate(sample_query, sample_expected)
print(eval_result)

generate the dataset

Plain Text

nodes = vector_store.get_nodes()

qa_dataset = generate_question_context_pairs(
    nodes, llm=llm, num_questions_per_chunk=2
)

reference: https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/cookbooks/cohere_retriever_eval.ipynb

error

File "/home/payload/miniconda3/envs/mloflo/lib/python3.12/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py", line 184, in _aget_nodes_with_embeddings
query_result = await self._vector_store.aquery(query, **self._kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/payload/miniconda3/envs/mloflo/lib/python3.12/site-packages/llama_index/vector_stores/qdrant/base.py", line 927, in aquery
response = await self._aclient.search(
^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'search'

6 comments

ppayload

how to use QueryFusionRetriever with CondensePlusContextChatEngine with use_async=True``

how to use QueryFusionRetriever with CondensePlusContextChatEngine with use_async=True

Plain Text

def get_chat_engine() -> "CondensePlusContextChatEngine":

    Settings.llm = OpenAI(model="gpt-4o", temperature=0.1)
    
    index =  VectorStoreIndex.from_vector_store(vector_store=vector_store)
    retriever = index.as_retriever(similarity_top_k=3)

    retriever = QueryFusionRetriever(
        [retriever],
        similarity_top_k=4,
        num_queries=4,
        mode="reciprocal_rerank",
        use_async=True,
        verbose=True,
        query_gen_prompt=BOT_QUERY_GEN_PROMPT
    )
    chat_engine = CondensePlusContextChatEngine.from_defaults(retriever=retriever, system_prompt=SUPPORT_BOT_SYSTEM_PROMPT, streaming=True)
    return chat_engine

async def chat(request: ChatRequestBody):
    try:
       
        engine = get_chat_engine()
        response_stream = engine.stream_chat(message, chat_history=history)
        return StreamingResponse(
            stream_generator(response_stream, request.history, request.timezone),
            media_type="application/x-ndjson",
        )

    except Exception as e:
        traceback.print_exc()
        raise HTTPException(
            status_code=500, detail=f"An error occurred while processing the request. {str(e)}"
        ) from e

this the error

RuntimeError: Nested async detected. Use async functions where possible (aquery, aretrieve, arun, etc.). Otherwise, use import nest_asyncio; nest_asyncio.apply() to enable nested async or use in a jupyter notebook.

6 comments

ppayload

Hey everyone I have documentation and I want to find the best embedding model for RAG. Ho

Hey everyone
I have documentation and I want to find the best embedding model for RAG. How can I score / benchmark different embedding models to find the best

14 comments

ppayload

secondly what is fusion rag and how does it compare against bm25s, re ranking algorithms

secondly
what is fusion rag and how does it compare against bm25s, re ranking algorithms
how can i use fusion rag without bm25s (is it necessary to integrate bm25s

21 comments

ppayload

hey everyone I have few questions - in BM25s retriever the the nodes are loaded in memor

hey everyone I have few questions

in BM25s retriever the the nodes are loaded in memory, for large documentations will this not increase the memory overheld and delay realtime response

11 comments

ppayload

hey how can i use `Hyde Query Transform`

hey how can i use Hyde Query Transform with a chat engine i was unable to find implementation with chat engine
is it not possible to implement it with chat engine?

edit: if there is no chat engine implementation, can i modify query engine to include chat history

9 comments

ppayload

[Bug]: Tracing With Langfuse · Issue #...

https://github.com/run-llama/llama_index/issues/14591#issuecomment-2212430495

need help with this

2 comments

ppayload

is it possible to connect the notion

is it possible to connect the notion documentation / connector with llama parse

2 comments

ppayload

hey everyone

hey everyone
i am using QdrantVectorStore i want to modify the metadata received / retrived from the vector store, before i send it to llm to generate the response any tips how can i do that

2 comments

ppayload

Chat

https://github.com/run-llama/llama_index/issues/14273#issuecomment-2181149146

hey i have the following question

what is the difference between OpenAI agent and ReAct agent & which to use
using PromptTemplates provided more controlled and consistent output compared to system prompts
in case of agent AzureOpenAI is very slow as compared OpenAI, there is about 10x delay in response generation. I have tried with both ReActAgent & OpenAIAgent

Plain Text

llm = AzureOpenAI(
    model=os.getenv("AOAI_COMPLETION_MODEL"),
    deployment_name=os.getenv("AOAI_DEPLOYMENT_NAME_COMPLETION"),
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    azure_endpoint=os.getenv("AOAI_ENDPOINT"),
    api_version=os.getenv("AOAI_API_VERSION"),
)

lastly how to use prompt template with chat engine

6 comments

ppayload

WARNING:root:Batch upload failed 1 times. Retrying...WARNING:root:Batch upload failed 2 t

WARNING:root:Batch upload failed 1 times. Retrying...
WARNING:root:Batch upload failed 2 times. Retrying...
WARNING:root:Batch upload failed 3 times. Retrying...

804 if "Content-Type" not in headers:
805 headers["Content-Type"] = "application/json"
--> 806 return self.apiclient.request( 807 type=m.InlineResponse2007,
813 content=body,
814 )

File ~/miniconda3/envs/mloflo/lib/python3.12/site-packages/qdrant_client/http/apiclient.py:79, in ApiClient.request(self, type, method, url, path_params, kwargs) 77 kwargs["timeout"] = int(kwargs["params"]["timeout"]) 78 request = self._client.build_request(method, url, kwargs)
---> 79 return self.send(request, type_)

File ~/miniconda3/envs/mloflo/lib/python3.12/site-packages/qdrant_client/http/apiclient.py:96, in ApiClient.send(self, request, type)
95 def send(self, request: Request, type_: Type[T]) -> T:
---> 96 response = self.middleware(request, self.send_inner)
97 if response.status_code in [200, 201, 202]:
98 try:

File ~/miniconda3/envs/mloflo/lib/python3.12/site-packages/qdrant_client/http/api_client.py:205, in BaseMiddleware.call(self, request, call_next)
204 def call(self, request: Request, call_next: Send) -> Response:
--> 205 return call_next(request)

File ~/miniconda3/envs/mloflo/lib/python3.12/site-packages/qdrant_client/http/api_client.py:108, in ApiClient.send_inner(self, request)
106 response = self._client.send(request)
107 except Exception as e:
--> 108 raise ResponseHandlingException(e)
109 return response

ResponseHandlingException: The write operation timed out

@kapa.ai

9 comments

ppayload

https://docs.llamaindex.ai/en/stable/

https://docs.llamaindex.ai/en/stable/examples/vector_stores/qdrant_hybrid/ in this,

Plain Text

query_engine = index.as_query_engine(
    similarity_top_k=2, sparse_top_k=12, vector_store_query_mode="hybrid"
)

what kind of hybrid retrieval is being used ?

6 comments

Find answers from the community

Building a Bot for General Queries and Platform-related Retrieval

i want to beable to use more then 1 vector sotre in llamaindex like one vector has data re

Building a Versatile Bot: General Queries and Platform-Specific Retrieval

How to trace your application with Azure...

Langfuse tracing

Agentic

Azure ai search vs azure cosmos db: key differences

and why dont llamaparse not support

i have markdown files to be vectorized

documents = SimpleDirectoryReader( input_files=pdf_docs , file_extractor=file_extracto

how to implement nemo guardrails over chat engine with streaming responses

i am using `QueryFusionRetriever` with

how to vectorize the documents (pdf, html) that include image ,text, tables for RAG

i am setting up my rag evaluation pipeline, here is my code ```python import osfrom d

how to use QueryFusionRetriever with CondensePlusContextChatEngine with use_async=True``

Hey everyone I have documentation and I want to find the best embedding model for RAG. Ho

secondly what is fusion rag and how does it compare against bm25s, re ranking algorithms

hey everyone I have few questions - in BM25s retriever the the nodes are loaded in memor

hey how can i use `Hyde Query Transform`

[Bug]: Tracing With Langfuse · Issue #...

is it possible to connect the notion

hey everyone

Chat

WARNING:root:Batch upload failed 1 times. Retrying...WARNING:root:Batch upload failed 2 t

https://docs.llamaindex.ai/en/stable/