Find answers from the community

r
rini
Offline, last seen 3 months ago
Joined September 25, 2024
what all typs of documents are supported by the create-llama tool? Will it work with docs with tables and images?
1 comment
W
r
rini
·

Packs


So I am using "EmbeddedTablesUnstructuredRetrieverPack" for my tables use case. Any way to stream the response?
embedded_tables_unstructured_pack.run() is returning the right response but how to stream it?
Also using this pack I can only ask questions and get an answer right. I can't really "Chat"?
5 comments
W
r
L
Getting the error "ImportError: cannot import name 'VectorStoreIndex' from 'llama_index.core' (unknown location)" when running :-
"
from llama_index.core.llama_pack import download_llama_pack

EmbeddedTablesUnstructuredRetrieverPack = download_llama_pack(
"EmbeddedTablesUnstructuredRetrieverPack",
"./embedded_tables_unstructured_pack",
)
"
Known error? I am blocked on this. A fast resolution would be great.
6 comments
W
L
r
Guys I am getting this error when calling the completions endpoint via llama-index even though my account has enough credits and the rate limit quota is not at all reached -> "
RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}
"
Does anybody have any clue here?
2 comments
L
W
Need help! 🥹
So my llm.stream_complete() call throws this error sometimes -> "requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))"

The llm I am using is gpt-4. Any idea how to avoid this error?
1 comment
L
Need guidance again. 🥹
So my use case is that I am giving an interface for students to come and evaluate themselves by solving a past question paper on a topic.
They are answering a question and if they answer wrongly I am sending the required data to the llm to generate a feedback -> this is pretty straightforward. I am directly calling llm.complete() endpoint.

The second use case is that at the end of the entire test - I want to generate a combined feedback. Of how he fared in the test.
So I need to send the entire list of questions in the paper, his answers and the correct answers.
To achieve the second use case should I create a node per combination of (question, answer, correct_answer) and apply tree_summarise OR just send the entire lists itself (does this even work).....!!?
1 comment
L
r
rini
·

Evaluations

Do you still recommend following this article for evaluation purposes? - https://blog.llamaindex.ai/building-and-evaluating-a-qa-system-with-llamaindex-3f02e9d87ce1
I see newer additions like FaithfulnessEvaluator etc.
3 comments
r
r
r
rini
·

Questions

needed some explanation on a few concepts in the llamaindex documentation :-

  1. https://gpt-index.readthedocs.io/en/latest/end_to_end_tutorials/dev_practices/production_rag.html - what do you mean by "Decoupling chunks used for retrieval vs. chunks used for synthesis" - the synthesis happens on the chunks retrieved only right? How can they be decoupled?
  1. https://gpt-index.readthedocs.io/en/latest/examples/retrievers/auto_vs_recursive_retriever.html - How is VectorStoreInfo working here? I can see we have set metadata for every node at the top of the article. How is it getting connected to the vector_store_info property?
4 comments
L
r
r
rini
·

URL reader

So my use case is that I need to extract the text from some URL links eg :-https://www.datastax.com/guides/what-is-retrieval-augmented-generation?filter=%7B%7D

When I had asked this question earlier you had suggested to use the BeautifulSoup reader. Howveer doing this: -
loader = BeautifulSoupWebReader() documents = loader.load_data(urls=['https://research.ibm.com/blog/retrieval-augmented-generation-RAG'])

is just returning the header and footer text.
How to go about this? Please help
2 comments
L
W
Need a little info pleasee.
What is the difference between Node and IndexNode. Why do we specifically use IndexNode when cretaing DocumentAgents?
2 comments
r
W
So my use case is that I need to extract only the body-text from some URL links eg :-https://www.datastax.com/guides/what-is-retrieval-augmented-generation?filter=%7B%7D
Now this link has info in the footer and other data as well. How do I selectively get the relevant body data only?

I looked at the following llama-hub loaders.

Which do you think works the best for my use case?
2 comments
L
T
So in my llama-index FastAPI app I have built an API which is returning the StreamingResponse type from FastAPI. I am doing :-
response = llm.stream_complete(prompt)
return StreamingResponse(response)
This is returning the error - AttributeError: 'CompletionResponse' object has no attribute 'encode’.
However, returning the streaming from a query engine is working completely fine. For example this code is running completely fine :-
response_stream = query_engine.query(query_text)
return StreamingResponse(response_stream.response_gen)
(I have used streaming=True when creating the response_synthesizer object)
Going through the llama-index code I realised “llm.stream_complete” is returning a CompletionResponseGen generator which is initialised as
CompletionResponseGen = Generator[CompletionResponse, None, None]

and response_synthesizer.synthesize()[llm_predictor.stream()] is returning TokenGen generator which is initialised as TokenGen = Generator[str, None, None].

Hence the encoding issue.
How do I fix this? Is this a llama-index limitation? Should I change the implementation itself and not use llm.stream_complete()? If not, which query engine to use - I am not creating an index here - I am just running a query against a text!!
2 comments
L
r
needed some guidance/help.
My use case is that I am given a question paper and for each question paper there's a corresponding marking scheme. I need to read the questions from the question paper pdf. The LLM shouldn't create it's own questions. Same for marking scheme. I feel it's a good use case for OpenAIPydantic program. What do you think?
8 comments
r
L
Need your help again!! 🥺

So I have built this llamaindex based app and used FastAPI to create the relevant APIs.

My app has a feature to chat with the video transcript. Now for this I have exposed 2 APIs - one to return the ContextChatEngine object and the other to return a response whenever the user types the query, with this object by calling query_engine.query().

But I am not able to return this ContextChatEngine object because the ContextChatEngine class is not serializable/deseriazable.Caling the API is throwing "TypeError: cannot pickle 'builtins.CoreBPE' object" and creating a custom response class is throwing an error too. Any idea how to fix this?
9 comments
L
r
E
r
rini
·

Performance

So I have built this LLM based application using llamaindex wherein we are taking a youtube video URL from the end user and summarising it for them. I am using the YoutubeTranscriptLoader, generating it's transcript, creating nodes using the SimpleNodeParser and creating a VectorStoreIndex on top of these nodes.

This is the code after creating the index for the Summarisation operation :-

def summarize_transcript():
retriever = VectorIndexRetriever(
index=index, similarity_top_k=len(index.docstore.docs))
response_synthesizer = get_response_synthesizer(
response_mode='tree_summarize')

query_engine = RetrieverQueryEngine(
retriever=retriever,
response_synthesizer=response_synthesizer,
)

query_text = f"""
You are an upbeat and friendly tutor with an encouraging tone.\
Provide Key Insights from the context information ONLY.
For each key insight, provide relevant summary in the form of bullet points.
Use no more than 500 words in your summary.
"""

response = query_engine.query(query_text)

Running this function for a video of a mere length of 1:41 minutes is taking 9 seconds which is unacceptable in production. I tried the use_async option, using a list index instead of vector index option and using just the TreeSummariser and no index option as well, but the performance didn't improve by much.

Can you please help me out here. For your reference the app is hosted on Streamlit cloud and can be accessed here - https://llm-steamlit-poc.streamlit.app/
9 comments
L
r
a
slightly off topic maybe but since I am building a full-stack app using LLamaindex, I was wondering should a request to an API that is creating a transcript from a YouTube video URL and then returning a summary be a "GET" or a "POST" request?
1 comment
L
when we create a chat engine, how is the "chat history" persisted? just like we have so many DB integrations available for vector store, what about the conversation history? How do we handle building a chat bot for a large organisation on a dataset and deploying it on azure?
1 comment
L
So my usecase is that I need to generate questions and answers from a large pdf file. I have created a vector index on top of this document and am using "tree_summarize" as the response mode. But let's say If i don't initialize a retriever, will it by default get all the nodes and then apply the "tree_summarize" response mode?
1 comment
L
Hi I am working on creating a chat_engine wherein the engine is supposed to generate questions from the context information, ask the student the question and then evaluate their answer as well for correctness. So here instead of just chatting with the context info, we have to create the questions as well?

So we either create a list of questions first, keep going through it while evaluating the answers (basically 2 diff query engines and 2 diff prompts) - is this a use case for using agents?

OR

We put everything in a single prompt like this - """Perform the following actions:
1 - Introduce yourself to the students.
2 - Wait for a response.
3 - Then ask a meaningful [Question] from the context information provided to assess the student's knowledge on the text.
4 - Wait for a response.
5 - Assess the student's response in the context of the text provided only.\
Evaluate the response on each of the [parameters] and provide a line of feedback.
[parameters]
  • Does the response answer all sub questions in the question?
  • Does the response answer all sub questions correctly?
  • Is the answer elaborate enough or is it in need of more explanation?
    6 - Continue these actions until the student types "Exit".
    """ - Not sure how to frame a common retriever and response_mode here though!!??
Can you please help me here and give a direction.
1 comment
L
Why don't we have the provision of providing a response_synthesizer object with chat_engines? It threw me an error when I passed it? Let's say I want to mention a text_qa_template for chat_engine object? how to do that?
3 comments
L
r
I am looking for reliable documentation around "async" usage in llamaindex. Can you provide me anything? Currently, I am only able to find it in titbits throughout the documentation.
1 comment
L
r
rini
·

Chat engine

What is the point of using chat_mode="openai" when creating a chat engine? The whole point of this chat_mode is to be able to use tools, query them and then provide an appropriate answer? How do I create tools when creating a chat engine this way?
M referring to this article - https://gpt-index.readthedocs.io/en/latest/examples/chat_engine/chat_engine_openai.html
5 comments
L
r
Hi all, can people who have experience with llamaindex help me figure out how you guys decide on the "chunk_size" and "chunk_overlap" field values? and similarly the "context_window" and "num_output" fields in prompt_helper !?
10 comments
r
L
To be able to provide the correct answers to questions asked on my pdf repository, shouldn't I first spend time on understanding which index to build on top of which type of documents and then reserach on the vector store, should I go ahead with VectorStoreIndex? Can someone please help clarify these doubts?
2 comments
A
L
For ListIndex and TreeIndex, does defining a vector_store even make sense? Because the embeddings are generated at query time, they aren't stored in the vector_store defined via storage_context right? The embeddings will be generated every time we query right?
1 comment
L