Find answers from the community

y
yaya90
Offline, last seen 4 months ago
Joined September 25, 2024
Trying to run this PyMuPDFReader = download_loader("PyMuPDFReader")

But I get the error:
InvalidURL: Invalid URL 'https:/raw.githubusercontent.com/run-llama/llama-hub/main/llama_hub/library.json': No host supplied

Never happened before. Any idea what could be wrong?
10 comments
y
L
If I use LlamaIndex, I know I can view the top-k nodes that were retrieved, e.g. with response.source_nodes[0]. Is there any way to actually figure out which nodes ended up being used in the output? e.g. only 3 out of 5 nodes (e.g. 1, 2, 5) were used in the answer.
5 comments
S
L
W
y
yaya90
·

Chunks

I'm trying to break the text into chunks of 500, with a 20 overlap each. But the following doesn't seem to be working. Any suggestions on what I'm doing wrong here?

Plain Text
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0,
                                        model_name=llama_model,
                                        max_tokens=512))

#prompt helper
context_window = 4096
num_output = 512 # set number of output tokens
chunk_overlap_ratio = 0.04 #set chunk overlap ratio, where .04 of 500 = 20 overlap
chunk_size_limit = 500 #set limit of chunk size

prompt_helper = PromptHelper(context_window, num_output, chunk_overlap_ratio, chunk_size_limit)


service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper = prompt_helper)

index = VectorStoreIndex.from_documents(documents, service_context = service_context, show_progress = True)


Suggestions are welcome!
5 comments
W
y
I somehow get the following issue when trying to run ServiceContext:

AttributeError: module 'numpy.linalg._umath_linalg' has no attribute '_ilp64'

This has never happened before. Any ideas how to fix this? Stackoverflow answers haven't helped :/
1 comment
W
y
yaya90
·

Metadata

+++ Identifying pages after splitting into chunks +++

I want to analyse a sustainability report using GPT. However, the report's length prohibits me from feeding the entirety into GPT. Therefore, I want to use LlamaIndex to divide the text into chunks, fetch the relevant chunks through a query, and use this to then feed into GPT.

However, I have one important issue I need to resolve. I want to always be able to identify on which page a certain statement was made. For instance, if I query for what the report mentions about allergens, I want to see the page number where the information is contained.

For instance, in the following report I'd want to know that this information about allergens is on Page 8. https://www.mcdonalds.co.jp/newcommon/sustainability_report2022/pdf/Sustainability%20Report%202022-en.pdf

How can I achieve this? Please let me know if my explanation does not make sense
5 comments
L
y
W
When I read in a document in markdown format (originally an annual report in .pdf format) using the following, it turns it into ~100 documents.
documents = SimpleDirectoryReader(directory).load_data()

Any idea why this is happening? Some of the documents end up being two words; others end up being 100 words
13 comments
y
L
c
y
yaya90
·

Gpt 3.5

Hi. I've been running into a problem, and I'm not sure how to resolve it. Hoping one of you knows the answer 🙂

I've been running the following (to analyse an annual report of a company):
Plain Text
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="gpt-3.5-turbo", max_tokens=512))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)

index = GPTSimpleVectorIndex.from_documents(documents, service_context = service_context)

response = index.query("What does the report mention about stakeholder engagement? Does it provide specific examples?", 
                       similarity_top_k = 3)

When I use 'gpt-3.5.turbo' I get this type of response:
The existing answer is still relevant and provides a comprehensive list of specific examples of stakeholder engagement [...]
Why does it talk about an existing answer? What is happening here?

When I use 'text-davinci-003", I receive a response that is more to my liking - and it doesn't talk about an "existing answer"
1 comment
L