Find answers from the community

Updated last year

I have all ICLR 2024 oral papers (~85

At a glance
I have all ICLR 2024 oral papers (~85 papers) downloaded as pdfs and want to ask questions, which app or RAG framework should I look at? Not getting good results with existing GPTs and apps online so I'm looking at llamaindex for solutions now

Context would be around 1M to 2M tokens in total in this case.
a
L
5 comments
This link is not working
Attachment
image.png
Hmm. Weird. Should be easy to fix at least
If you want to query over multiple papers like, probably either throw it all into a vector index and use a sub question query engine, or use a document summary index (the latter will create a summary for each paper)
I'm following https://docs.llamaindex.ai/en/stable/use_cases/q_and_a/rag_cli.html
and get error:
llama_index/core/ingestion/pipeline.py", line 94, in get_transformation_hash
return sha256((nodes_str + transform_string).encode("utf-8")).hexdigest()
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud835' in position 5602662: surrogates not allowed

maybe need to make the code more robust to this issue?
might happen for some pdfs
Hmm weird. The data should be casted as a string already, but seems like it wasn't?
Add a reply
Sign up and join the conversation on Discord