Find answers from the community

Updated 9 months ago

@Logan M have been a bit out of touch

At a glance
@Logan M have been a bit out of touch with llamaindex so need your help starting back please!
I have a RAG usecase where I need to build a ChatBot over a 472 page PDF consisting of Text, Images, Tables, PowerShell code and MySQL code.
Could you give me some quick pointers on what kind of text splitters and other specific concepts to use for this use case.
b
r
L
5 comments
hey rini, most important thing is going to be parsing the PDF to a format that is readable by LLM (garbage in, garbage out)
Hey I do remember the basic concepts of RAG around llamaindex. Just a bit rusty with the type of QueryEngines and Text Splitting options. I remember "Unstructured" python library, will this suffice for my usecase?
unstructured is good start.

Then you can use a query pipeline to setup text splitters, retrievers, etc.

this might be good to checkout https://docs.llamaindex.ai/en/stable/examples/agent/agent_runner/agent_around_query_pipeline_with_HyDE_for_PDFs
might be a little advanced
Yea really any splitter will be fine. The most important part is getting a readable format as @bmax mentioned

Llamaparse or unstructured is a good starting point
Add a reply
Sign up and join the conversation on Discord