What's the latest solution to embedding

At a glance

The community members are discussing the latest solution for embedding text and pictures together in a Word or PDF document, and displaying the pictures as needed in a reply. They mention that the Llama Index library may be able to handle this, by creating ImageDocument or ImageNode objects that point to the image files. However, the multi-modal capabilities are still a work in progress. One community member suggests a two-step process: 1) convert the pictures in the documents to URLs with text references, and 2) add those text references to the original text file for chunking and embedding. The community members indicate that this manual processing is currently required, but they are working on improving the process.

Useful resources

aautratec

What's the latest solution to embedding txt and picture together in a word or pdf and display the picture as needed in reply ?

7 comments

TTeemu

Did you check this out yet? https://twitter.com/llama_index/status/1731843485115064531

aautratec

i am looking for some sample code/solution to embedding the word/pdf direclty, including the picture in the document. When we using RAG to getting an answer from LLM, the text content + the reference picture will be shown in the reply message. We might need llamaindex to store those picture in a working local folder with some reference linkage and teach LLM to call those picture and including in the LLM reply text. Might work as a function call, etc.

aautratec

the requirement behind this is: we have lots of training document with screen snapshot. rather showing the instructions step by step, attaching a screen will be very helpful. can it be handled by latest GPT4V ? or still need to wait for any new solution ?

aautratec

@Teemu @Logan M

LLogan M

It sounds like it can be handled by gpt-4v? You can create ImageDocument or ImageNode objects that point to the image_path where you've saved the image. These nodes can also optinally include text.

Our multi-modal stuff is still a bit in-progress, but lots of info here
https://docs.llamaindex.ai/en/stable/use_cases/multimodal.html

aautratec

Thanks for the reply. My thoughts is there should be a 2 steps embedding. Step 1, get those pictures in the documents converted into url of pic with text reference. Step 2, those text reference added into original txt file and ready for chunk and normal txt embedding.

LLogan M

yea we are working on making a process better. For now it requires manual processing to do that 🙂

Add a reply

Find answers from the community

What's the latest solution to embedding