Detailed Image Description Using Gemini Language Model

Question

# Use the Reddit image URL
image_urls = [ "https://i.redd.it/7gyrosc13tme1.jpeg"
] # Load the image from URL
image_documents = load_image_urls(image_urls) # Initialize Gemini
llm = Gemini(model="models/gemini-2.0-flash") # Ask the model to describe the image
response = llm.complete( prompt="Describe this image in detail. What does it show?", image_documents=image_documents
)

Yamada Fujio · Answer

# Import required libraries import google.generativeai as genai from PIL import Image import requests from io import BytesIO # Configure the API with your key genai.configure(api_key="YOUR_API_KEY") # Option 1: Load image from URL using PIL image_url = "https://i.redd.it/7gyrosc13tme1.jpeg" response = requests.get(image_url) image = Image.open(BytesIO(response.content)) # Initialize the model model = genai.GenerativeModel('gemini-pro-vision') # Generate content with the image response = model.generate_content( contents=[ "Describe this image in detail. What does it show?", image ] ) # Print the response print(response.text)

Fremko · Answer

i prefer to use llamaindex

Fremko · Answer

this would work if llm.complete() didnt validate promptimage_urls = [ "https://i.redd.it/7gyrosc13tme1.jpeg"
] # Get the image using requests and load with PIL
image_response = requests.get(image_urls[0])
image = Image.open(BytesIO(image_response.content)) # Initialize the multi-modal LLM (GPT-4 Vision)
llm = Gemini(model="models/gemini-2.0-flash") # Ask the model to describe the image
response = llm.complete( prompt=["Describe this image in detail. What does it show?", image]
) print(response)but pydantic throws an error when validating prompt

WhiteFang_Jr · Answer

The structure for LlamaIndex is a bit different while handling of images: Try following this to read images: https://docs.llamaindex.ai/en/stable/examples/multi_modal/openai_multi_modal/#ask-the-model-to-describe-what-it-sees

Fremko · Answer

quite frustrating implementation

Logan M · Answer

Passing in image_documents is an older syntax. It's specific to GeminiMultiModal class

We are moving towards just using the base llm class, but using chat messages with content blocks

Personally I think it's much nicer

Find answers from the community

Detailed Image Description Using Gemini Language Model