Hey how are you? I am working on building a side project chatbot to interact with one of my ancestor's memoirs. The original was in PDF and the system that digitized the text has errors. I was curious if you had any thoughts of ways to clean up this text before I process the data so it is done in a sane way.