Find answers from the community

Updated 2 months ago

do i get it right that we can now upload

do i get it right that we can now upload all books with llava, not worrying about chunking? storage wise, are images converted into similar vector storage / dimensions or will they require more space?
d
W
M
4 comments
are you talking about the multimodal LLM llava (https://llava-vl.github.io/)?

we are still working on full support in the framework, for now we have some example notebooks on how to leverage it for different use-cases (e.g. https://twitter.com/jerryjliu0/status/1717205234269983030)
Woah this is wild!
it's a matter of time, probably weeks or days, when we will be 'reading' books just recording a video of us turning the pages of a book. really what it takes is to figure out the right time interval and roi for captured images, and then feed the images to gpt4v or llava. but llava is not there yet. it can understand the images. but does not always want to extract the text. they put some strict guadrails to instruct llava not to do anything if it 'sees' what looks like a book. you need to be very creative in your prompt and how you capture the image.
Attachments
Screenshot_2023-10-26_at_8.02.00_AM.png
Screenshot_2023-10-26_at_8.02.12_AM.png
absolutely! the future is quite exciting. I expect the open source models to catch up fairly quickly
Add a reply
Sign up and join the conversation on Discord