do i get it right that we can now upload

At a glance

do i get it right that we can now upload all books with llava, not worrying about chunking? storage wise, are images converted into similar vector storage / dimensions or will they require more space?

4 comments

ddisiok

are you talking about the multimodal LLM llava (https://llava-vl.github.io/)?

we are still working on full support in the framework, for now we have some example notebooks on how to leverage it for different use-cases (e.g. https://twitter.com/jerryjliu0/status/1717205234269983030)

WWhiteFang_Jr

Woah this is wild!

MMitchMcD

it's a matter of time, probably weeks or days, when we will be 'reading' books just recording a video of us turning the pages of a book. really what it takes is to figure out the right time interval and roi for captured images, and then feed the images to gpt4v or llava. but llava is not there yet. it can understand the images. but does not always want to extract the text. they put some strict guadrails to instruct llava not to do anything if it 'sees' what looks like a book. you need to be very creative in your prompt and how you capture the image.

Attachments

ddisiok

absolutely! the future is quite exciting. I expect the open source models to catch up fairly quickly

Add a reply

Find answers from the community

do i get it right that we can now upload