yaya90

Trying to run this `PyMuPDFReader =

Trying to run this PyMuPDFReader = download_loader("PyMuPDFReader")

But I get the error:

InvalidURL: Invalid URL 'https:/raw.githubusercontent.com/run-llama/llama-hub/main/llama_hub/library.json': No host supplied

Never happened before. Any idea what could be wrong?

10 comments

yyaya90

If I use LlamaIndex, I know I can view

If I use LlamaIndex, I know I can view the top-k nodes that were retrieved, e.g. with response.source_nodes[0]. Is there any way to actually figure out which nodes ended up being used in the output? e.g. only 3 out of 5 nodes (e.g. 1, 2, 5) were used in the answer.

5 comments

yyaya90

Chunks

I'm trying to break the text into chunks of 500, with a 20 overlap each. But the following doesn't seem to be working. Any suggestions on what I'm doing wrong here?

Plain Text

llm_predictor = LLMPredictor(llm=OpenAI(temperature=0,
                                        model_name=llama_model,
                                        max_tokens=512))

#prompt helper
context_window = 4096
num_output = 512 # set number of output tokens
chunk_overlap_ratio = 0.04 #set chunk overlap ratio, where .04 of 500 = 20 overlap
chunk_size_limit = 500 #set limit of chunk size

prompt_helper = PromptHelper(context_window, num_output, chunk_overlap_ratio, chunk_size_limit)


service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper = prompt_helper)

index = VectorStoreIndex.from_documents(documents, service_context = service_context, show_progress = True)

Suggestions are welcome!

5 comments

yyaya90

I somehow get the following issue when

I somehow get the following issue when trying to run ServiceContext:

AttributeError: module 'numpy.linalg._umath_linalg' has no attribute '_ilp64'

This has never happened before. Any ideas how to fix this? Stackoverflow answers haven't helped :/

1 comment

yyaya90

Metadata

+++ Identifying pages after splitting into chunks +++

I want to analyse a sustainability report using GPT. However, the report's length prohibits me from feeding the entirety into GPT. Therefore, I want to use LlamaIndex to divide the text into chunks, fetch the relevant chunks through a query, and use this to then feed into GPT.

However, I have one important issue I need to resolve. I want to always be able to identify on which page a certain statement was made. For instance, if I query for what the report mentions about allergens, I want to see the page number where the information is contained.

For instance, in the following report I'd want to know that this information about allergens is on Page 8. https://www.mcdonalds.co.jp/newcommon/sustainability_report2022/pdf/Sustainability%20Report%202022-en.pdf

How can I achieve this? Please let me know if my explanation does not make sense

5 comments

yyaya90

When I read in a document in markdown

When I read in a document in markdown format (originally an annual report in .pdf format) using the following, it turns it into ~100 documents.
documents = SimpleDirectoryReader(directory).load_data()

Any idea why this is happening? Some of the documents end up being two words; others end up being 100 words

13 comments

yyaya90

Gpt 3.5

Hi. I've been running into a problem, and I'm not sure how to resolve it. Hoping one of you knows the answer 🙂

I've been running the following (to analyse an annual report of a company):

Plain Text

llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="gpt-3.5-turbo", max_tokens=512))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)

index = GPTSimpleVectorIndex.from_documents(documents, service_context = service_context)

response = index.query("What does the report mention about stakeholder engagement? Does it provide specific examples?", 
                       similarity_top_k = 3)

When I use 'gpt-3.5.turbo' I get this type of response:

The existing answer is still relevant and provides a comprehensive list of specific examples of stakeholder engagement [...]

Why does it talk about an existing answer? What is happening here?

When I use 'text-davinci-003", I receive a response that is more to my liking - and it doesn't talk about an "existing answer"

1 comment

Find answers from the community

Trying to run this `PyMuPDFReader =

If I use LlamaIndex, I know I can view

Chunks

I somehow get the following issue when

Metadata

When I read in a document in markdown

Gpt 3.5