Generating Metadata for API Responses in Python

At a glance

The community members are discussing ways to generate metadata (e.g. summary, title, QA pairs) for an API without negatively impacting the API response time. Some suggestions include using polling systems, ingestion pipelines, and processing images/diagrams. However, there are concerns about the complexity of amalgamating the nodes returned by the ingestion pipelines and the time required for extensive extraction. The community members are trying to find a balance between generating good metadata, maintaining speed, and keeping consistency between the API and frontend.

ZZachHandley

Hey guys just curious, what's your best way to generate metadata (e.g. summary, title, QA pairs) on things as an API without hurting the API response in py?

5 comments

LLogan M

Not sure what you mean by "without hurting the API responses" 👀

ZZachHandley

e.g. I know there's polling systems, and I am aware of the ingestion pipelines, but if the ingestion pipelines return nodes, what's the best way to amalgamate those nodes back into one document, and generate metadata (ideally using AI for some) to keep some sense of consistency between the API and frontend, while maintaining speed and generating good metadata

ZZachHandley

I created a custom one for each using 4o because I wanted to process pictures of each page of a PDF too, just in case like, what about schematics or diagrams or maps, etc.?

ZZachHandley

but then deleting the ref doc's becomes a whole other thing, and in general that much extraction can take a minute, but they want speed, so do I ignore the images for now?

ZZachHandley

yay indexing haha

Add a reply

Find answers from the community

Generating Metadata for API Responses in Python