Find answers from the community

Updated 10 months ago

@Logan M have been a bit out of touch

At a glance

@Logan M have been a bit out of touch with llamaindex so need your help starting back please!
I have a RAG usecase where I need to build a ChatBot over a 472 page PDF consisting of Text, Images, Tables, PowerShell code and MySQL code.
Could you give me some quick pointers on what kind of text splitters and other specific concepts to use for this use case.

5 comments

bbmax

hey rini, most important thing is going to be parsing the PDF to a format that is readable by LLM (garbage in, garbage out)

rrini

Hey I do remember the basic concepts of RAG around llamaindex. Just a bit rusty with the type of QueryEngines and Text Splitting options. I remember "Unstructured" python library, will this suffice for my usecase?

bbmax

unstructured is good start.

Then you can use a query pipeline to setup text splitters, retrievers, etc.

this might be good to checkout https://docs.llamaindex.ai/en/stable/examples/agent/agent_runner/agent_around_query_pipeline_with_HyDE_for_PDFs

bbmax

might be a little advanced

LLogan M

Yea really any splitter will be fine. The most important part is getting a readable format as @bmax mentioned

Llamaparse or unstructured is a good starting point

Add a reply