The community members are discussing a use case where they need to read questions from a PDF question paper and grade them using an LLM (likely a pydantic program). The main challenges identified are:
1. Getting the text from the PDF correctly, which could involve using a normal PDF loader or OCR if the PDF is not true-digital.
2. Parsing the PDF text to extract "Question" objects, as a normal PDF loader may just return the raw text.
The community members suggest that the PDF should be somewhat formatted to make the parsing easier. They also discuss the possibility of using a package like camelot to extract tables and images from the PDF.
needed some guidance/help. My use case is that I am given a question paper and for each question paper there's a corresponding marking scheme. I need to read the questions from the question paper pdf. The LLM shouldn't create it's own questions. Same for marking scheme. I feel it's a good use case for OpenAIPydantic program. What do you think?
Yea that sounds about right. There's two part here -- getting the text off the PDF correctly, and then grading it with an LLM (likely a pydantic program)