Find answers from the community

Updated last year

Image captioning

Does llama-index have any multimodal implementations that can do image to text?
L
T
2 comments
Yea, simple directory reader will do some of that under the hood for you.

Or you can use the image loader directly

The actual loader code is here

https://github.com/jerryjliu/llama_index/blob/main/llama_index/readers/file/image_caption_reader.py
Thanks! Exactly what I was looking for. Been using the image output thing but that's the exact model I wanted to use!
Add a reply
Sign up and join the conversation on Discord