Find answers from the community

Updated 2 years ago

Economy of the United States

At a glance

I'm trying to parse this article: https://en.wikipedia.org/wiki/Economy_of_the_United_States#Mergers_and_acquisitions" and in the section in the attached screenshot it has some info about the 2017 GDP per capita in the US. My query is for the GDP per capita in 2022, but it unfortunately returns the value for the 2017 GDP mistakenly as the 2022 GDP value.

Attachment

13 comments

jjerryjliu0

ahh..are you using the simple vector index?

vvkdi5cord

I tried simple vector index, but also tried GPTFaissIndex

vvkdi5cord

I'm not sure which index / params are best for this, do you have any suggestions?

vvkdi5cord

Also maybe some specific chunk sizes would be better?

jjerryjliu0

you could try smaller chunk sizes, and play around with similarity_top_k

e.g. GPTSimpleVectorIndex(docs, chunk_size_limit=512) (or 256, etc.). And by default similarity_top_k is 1 but it can be higher

vvkdi5cord

Thanks!

rravitheja

I faced a similar issue. It might be due to the fact that GPT-3 does not perform accurately with numbers.

vvkdi5cord

Can GPT-index parse html pages and tables embedded in html?

vvkdi5cord

the wikipedia page I'm using has a lot of html tables and I don't know if that is affecting the quality I'm seeing?

jjerryjliu0

we currently have a web page parser which will run html2text, though we don't have that inherently in the wikipedia reader

vvkdi5cord

Do you know how to overcome this error "ValueError: A single term is larger than the allowed chunk size.Term size: 871Chunk size: 369" with GPTTreeIndex

jjerryjliu0

@vkdi5cord which version are you on? i recently added some fixes this past week to hopefully address that. the error occurs because you have a single token (tokens are split by separator) which is longer than allowed length

vvkdi5cord

Thanks, I upgraded to the latest and that now works!

Add a reply