Find answers from the community

Updated last year

What's the latest solution to embedding

At a glance

The community members are discussing the latest solution for embedding text and pictures together in a Word or PDF document, and displaying the pictures as needed in a reply. They mention that the Llama Index library may be able to handle this, by creating ImageDocument or ImageNode objects that point to the image files. However, the multi-modal capabilities are still a work in progress. One community member suggests a two-step process: 1) convert the pictures in the documents to URLs with text references, and 2) add those text references to the original text file for chunking and embedding. The community members indicate that this manual processing is currently required, but they are working on improving the process.

Useful resources
What's the latest solution to embedding txt and picture together in a word or pdf and display the picture as needed in reply ?
T
a
L
7 comments
i am looking for some sample code/solution to embedding the word/pdf direclty, including the picture in the document. When we using RAG to getting an answer from LLM, the text content + the reference picture will be shown in the reply message. We might need llamaindex to store those picture in a working local folder with some reference linkage and teach LLM to call those picture and including in the LLM reply text. Might work as a function call, etc.
the requirement behind this is: we have lots of training document with screen snapshot. rather showing the instructions step by step, attaching a screen will be very helpful. can it be handled by latest GPT4V ? or still need to wait for any new solution ?
@Teemu @Logan M
It sounds like it can be handled by gpt-4v? You can create ImageDocument or ImageNode objects that point to the image_path where you've saved the image. These nodes can also optinally include text.

Our multi-modal stuff is still a bit in-progress, but lots of info here
https://docs.llamaindex.ai/en/stable/use_cases/multimodal.html
Thanks for the reply. My thoughts is there should be a 2 steps embedding. Step 1, get those pictures in the documents converted into url of pic with text reference. Step 2, those text reference added into original txt file and ready for chunk and normal txt embedding.
yea we are working on making a process better. For now it requires manual processing to do that πŸ™‚
Add a reply
Sign up and join the conversation on Discord