Find answers from the community

Updated 5 days ago

Detailed Image Description Using Gemini Language Model

Plain Text
# Use the Reddit image URL
image_urls = [
    "https://i.redd.it/7gyrosc13tme1.jpeg"
]

# Load the image from URL
image_documents = load_image_urls(image_urls)

# Initialize Gemini
llm = Gemini(model="models/gemini-2.0-flash")

# Ask the model to describe the image
response = llm.complete(
    prompt="Describe this image in detail. What does it show?",
    image_documents=image_documents
)
1
Y
F
W
6 comments
Plain Text
# Import required libraries  
import google.generativeai as genai  
from PIL import Image  
import requests  
from io import BytesIO  

# Configure the API with your key  
genai.configure(api_key="YOUR_API_KEY")  

# Option 1: Load image from URL using PIL  
image_url = "https://i.redd.it/7gyrosc13tme1.jpeg"  
response = requests.get(image_url)  
image = Image.open(BytesIO(response.content))  

# Initialize the model  
model = genai.GenerativeModel('gemini-pro-vision')  

# Generate content with the image  
response = model.generate_content(  
    contents=[  
        "Describe this image in detail. What does it show?",  
        image  
    ]  
)  

# Print the response  
print(response.text)  
i prefer to use llamaindex
this would work if llm.complete() didnt validate prompt
Plain Text
image_urls = [
    "https://i.redd.it/7gyrosc13tme1.jpeg"
]

# Get the image using requests and load with PIL
image_response = requests.get(image_urls[0])
image = Image.open(BytesIO(image_response.content))

# Initialize the multi-modal LLM (GPT-4 Vision)
llm = Gemini(model="models/gemini-2.0-flash")

# Ask the model to describe the image
response = llm.complete(
    prompt=["Describe this image in detail. What does it show?",
    image]
)

print(response)

but pydantic throws an error when validating prompt
The structure for LlamaIndex is a bit different while handling of images: Try following this to read images: https://docs.llamaindex.ai/en/stable/examples/multi_modal/openai_multi_modal/#ask-the-model-to-describe-what-it-sees
quite frustrating implementation
Passing in image_documents is an older syntax. It's specific to GeminiMultiModal class

We are moving towards just using the base llm class, but using chat messages with content blocks

Personally I think it's much nicer
Add a reply
Sign up and join the conversation on Discord