Find answers from the community

Updated 6 months ago

jerryjliu98 9313 What multimodal

At a glance

What multimodal capabilities LlamaIndex has

7 comments

we just released some today! you can ingest image Documents as well as text Documents

will expand this abstraction once more details of gpt-4's hybrid image/text api is released

@jerryjliu0 Thanks for sharing, is this done with ocr in behind?

yep! Currently it is

@jerryjliu0 What are you using for OCR?

It’s either pytesseract or the DONUT model

Add a reply