Find answers from the community

Updated 3 months ago

Hey @Logan M how are you? I am working

Hey how are you? I am working on building a side project chatbot to interact with one of my ancestor's memoirs. The original was in PDF and the system that digitized the text has errors. I was curious if you had any thoughts of ways to clean up this text before I process the data so it is done in a sane way.
Attachment
image.png
L
c
3 comments
hmm, for typos there might be some python libraries to automatically fix them

Another idea is prompting an LLM to rewrite, and asking it to fix spelling and gramatical errors
I'm not sure if you've tried it, but I know LlamaParse will probably do a decent job at cleaning it up (hopefully)
@Logan M thanks for the tip, I’ll do some research on LlamaParse. Thank you!
Add a reply
Sign up and join the conversation on Discord