Find answers from the community

Updated 11 months ago

Imagez

Plain Text
import openai
import os
from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.indices.multi_modal.base import MultiModalVectorStoreIndex
from llama_index.query_engine import SimpleMultiModalQueryEngine
from llama_index import SimpleDirectoryReader
from llama_index.vector_stores import QdrantVectorStore
from qdrant_client import QdrantClient

from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
openai.api_key = OPENAI_API_KEY

client = QdrantClient(url="http://localhost")
openai_mm_llm = OpenAIMultiModal(
    model="gpt-4-vision-preview",
    api_key=os.getenv("OPENAI_API_KEY"),
    max_new_tokens=1500,
)
vector_store = QdrantVectorStore(
    "global_text_store",
    client=client,
)
image_store = QdrantVectorStore(
    "global_image_store",
    client=client,
)
index = MultiModalVectorStoreIndex.from_vector_store(
    vector_store=vector_store,
    image_vector_store=image_store,
    use_async=False,
    show_progress=True,
)
retriever = index.as_retriever()
image_nodes = retriever.retrieve("Find images in the knowledgebase.")
print("Image Nodes: ", image_nodes)
query_engine = SimpleMultiModalQueryEngine(
    retriever=index.as_retriever(),
    openai_mm_llm=openai_mm_llm,
)

response_1 = query_engine.query(
    "Describe the images in your knowledgebase as if you were a blind person.",
)
print("Response: ", response_1)


This works for the retriever, it retrieves text and image nodes, it does NOT work for the query
L
Z
13 comments
If you check response.source_nodes, you can see the nodes it used to make the response.

You might have to prompt engineer a bit tbh to get it to pay attention to thr image properly
so I have 10 images and the only thing the text stores say is what I sent you before
"This is a test text node" something like that
the images are all unique, but even if I tell it to just summarize the images it still won't help me
Are the images in the response.source_nodes?
Plain Text
Response:  I'm sorry, but I cannot provide a description for the image as there seems to be a misunderstanding. There is no image attached to your query for me to describe. If you have an image you would like me to describe, please provide it, and I will do my best to give you a detailed description.
Response Source Nodes:  [NodeWithScore(node=TextNode(id_='a6f532dc-6d82-4ab3-9622-36467d7870cc', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='1234567890_916035975470858.0', node_type=<ObjectType.DOCUMENT: '4'>, metadata={}, hash='741c7c6c958537b3d737de4cf0c09fc96f607abcfdab03d6b911c8094be9257d')}, hash='741c7c6c958537b3d737de4cf0c09fc96f607abcfdab03d6b911c8094be9257d', text='This is a test text', start_char_idx=0, end_char_idx=19, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.75210965), NodeWithScore(node=ImageNode(id_='ada19de9-730c-47f0-89bf-fa4158e31f8c', embedding=None, metadata={'user_id': '1234567890'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='133857111694341.78', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'user_id': '1234567890'}, hash='ec5b518200bee786ad0d6504e77971bee8a31e74cf2f4dd3ff028db740a49744')}, hash='ec5b518200bee786ad0d6504e77971bee8a31e74cf2f4dd3ff028db740a49744', text='', start_char_idx=0, end_char_idx=0, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n', image='/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAH0AfQDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV
but if my query is literally, "Describe the images in your knowledgebase as if you were a blind person" isn't that... enough?
I have no idea. I know when I tried with your sample repo before, it was complaining about the image not being related to the text or query lol
Hm. But I'm just asking it to utilize the images.
If it won't even use them then what's the point of all this? How do I force it to use them then? Do I need to make my own response synthesizer?
when I give ChatGPT an image and I say, "Please describe what's in the image", so I guess I need to AI generate the metadata tags using the image document
Add a reply
Sign up and join the conversation on Discord