Find answers from the community

Updated 2 months ago

jerryjliu98 9313 What multimodal

What multimodal capabilities LlamaIndex has
j
a
L
7 comments
we just released some today! you can ingest image Documents as well as text Documents
will expand this abstraction once more details of gpt-4's hybrid image/text api is released
@jerryjliu0 Thanks for sharing, is this done with ocr in behind?
yep! Currently it is
@jerryjliu0 What are you using for OCR?
It’s either pytesseract or the DONUT model
Add a reply
Sign up and join the conversation on Discord