Log in
Log into community
Find answers from the community
View all posts
Related posts
Did this answer your question?
π
π
π
Powered by
Hall
Active
Updated 4 weeks ago
0
Follow
Image
Image
Active
0
Follow
t
tarpus
4 weeks ago
Β·
does LI have a solution to managing the size of a page to send the oai image model?
to manage resolution vs. tokens?
L
t
7 comments
Share
Open in Discord
L
Logan M
edited 4 weeks ago
I think for resolution, you can just set low/high/auto for image details
https://github.com/run-llama/llama_index/blob/af9abd06a456a3745d02379f8afc4b6cab3a3f72/llama-index-integrations/multi_modal_llms/llama-index-multi-modal-llms-openai/llama_index/multi_modal_llms/openai/base.py#L60
I havent checked openais exact api to see if they have more controls than that recently
t
tarpus
4 weeks ago
Will take a look. Glad to see you're in the room.
Thank you.
L
Logan M
edited 4 weeks ago
we added a new multimodal node
https://github.com/run-llama/llama_index/pull/16962
improved multimodal support in chat messages
https://github.com/run-llama/llama_index/pull/15969
Next up is updating the llms to work with the new chat messages
Then updating the retrievers to work with the new nodes
Then updating anything else remaining
t
tarpus
4 weeks ago
my pdf's are all scanned physial documents. so the pages are basically like photos of document pages.
t
tarpus
4 weeks ago
they are a challenge to work with.
L
Logan M
4 weeks ago
Typically the best approach we've seen is using llama parse (or something else) to ocr the page, and sending both the text and image to the llm
We have examples doing that π
L
Logan M
edited 4 weeks ago
https://github.com/run-llama/llama_parse/blob/main/examples/multimodal/multimodal_rag_slide_deck.ipynb
Add a reply
Sign up and join the conversation on Discord
Join on Discord