Find answers from the community

s
F
Y
a
P
Updated 11 months ago

What's the latest solution to embedding

What's the latest solution to embedding txt and picture together in a word or pdf and display the picture as needed in reply ?
T
a
L
7 comments
i am looking for some sample code/solution to embedding the word/pdf direclty, including the picture in the document. When we using RAG to getting an answer from LLM, the text content + the reference picture will be shown in the reply message. We might need llamaindex to store those picture in a working local folder with some reference linkage and teach LLM to call those picture and including in the LLM reply text. Might work as a function call, etc.
the requirement behind this is: we have lots of training document with screen snapshot. rather showing the instructions step by step, attaching a screen will be very helpful. can it be handled by latest GPT4V ? or still need to wait for any new solution ?
@Teemu @Logan M
It sounds like it can be handled by gpt-4v? You can create ImageDocument or ImageNode objects that point to the image_path where you've saved the image. These nodes can also optinally include text.

Our multi-modal stuff is still a bit in-progress, but lots of info here
https://docs.llamaindex.ai/en/stable/use_cases/multimodal.html
Thanks for the reply. My thoughts is there should be a 2 steps embedding. Step 1, get those pictures in the documents converted into url of pic with text reference. Step 2, those text reference added into original txt file and ready for chunk and normal txt embedding.
yea we are working on making a process better. For now it requires manual processing to do that πŸ™‚
Add a reply
Sign up and join the conversation on Discord