The community members are discussing ways to generate metadata (e.g. summary, title, QA pairs) for an API without negatively impacting the API response time. Some suggestions include using polling systems, ingestion pipelines, and processing images/diagrams. However, there are concerns about the complexity of amalgamating the nodes returned by the ingestion pipelines and the time required for extensive extraction. The community members are trying to find a balance between generating good metadata, maintaining speed, and keeping consistency between the API and frontend.
Hey guys just curious, what's your best way to generate metadata (e.g. summary, title, QA pairs) on things as an API without hurting the API response in py?
e.g. I know there's polling systems, and I am aware of the ingestion pipelines, but if the ingestion pipelines return nodes, what's the best way to amalgamate those nodes back into one document, and generate metadata (ideally using AI for some) to keep some sense of consistency between the API and frontend, while maintaining speed and generating good metadata
I created a custom one for each using 4o because I wanted to process pictures of each page of a PDF too, just in case like, what about schematics or diagrams or maps, etc.?
but then deleting the ref doc's becomes a whole other thing, and in general that much extraction can take a minute, but they want speed, so do I ignore the images for now?