Find answers from the community

Updated 2 years ago

Economy of the United States

At a glance
I'm trying to parse this article: https://en.wikipedia.org/wiki/Economy_of_the_United_States#Mergers_and_acquisitions" and in the section in the attached screenshot it has some info about the 2017 GDP per capita in the US. My query is for the GDP per capita in 2022, but it unfortunately returns the value for the 2017 GDP mistakenly as the 2022 GDP value.
Attachment
image.png
j
v
r
13 comments
ahh..are you using the simple vector index?
I tried simple vector index, but also tried GPTFaissIndex
I'm not sure which index / params are best for this, do you have any suggestions?
Also maybe some specific chunk sizes would be better?
you could try smaller chunk sizes, and play around with similarity_top_k

e.g. GPTSimpleVectorIndex(docs, chunk_size_limit=512) (or 256, etc.). And by default similarity_top_k is 1 but it can be higher
I faced a similar issue. It might be due to the fact that GPT-3 does not perform accurately with numbers.
Can GPT-index parse html pages and tables embedded in html?
the wikipedia page I'm using has a lot of html tables and I don't know if that is affecting the quality I'm seeing?
we currently have a web page parser which will run html2text, though we don't have that inherently in the wikipedia reader
Do you know how to overcome this error "ValueError: A single term is larger than the allowed chunk size.Term size: 871Chunk size: 369" with GPTTreeIndex
@vkdi5cord which version are you on? i recently added some fixes this past week to hopefully address that. the error occurs because you have a single token (tokens are split by separator) which is longer than allowed length
Thanks, I upgraded to the latest and that now works!
Add a reply
Sign up and join the conversation on Discord