Find answers from the community

Updated 5 days ago

Vertex AI issues by getting MAX TOKEN error (Finish reason 2 and 4)

Hey, does anyone know how to solve this error? Right now I am using Vertex AI LLM, and when I try to do a refine response synthethizer to query on a ReActAgent, it throws the following error. I tried adding "response_validation = False" into the query to the top agent, but it doesn't work. Any ideas?

Plain Text
...
> Running step 8f16654d-e689-4366-a3c0-8e7a397a94cf. Step input: estacionamientos y evaluación ambiental
Thought: The current language of the user is: Spanish. I need to determine if the question is specific enough to use the vector_tool or if it requires a more general summary using the summary_tool. The question "estacionamientos y evaluación ambiental" (parking lots and environmental assessment) is quite broad. I will use the summary_tool to get a general overview of how Decreto 40 of 2013 addresses parking lots and environmental assessments.
Action: summary_tool
Action Input: {'input': 'Decreto 40 de 2013, estacionamientos y evaluación ambiental'}
> Refine context: Decreto 30,
g ter) Observación ciudadana: Toda ...
> Refine context: c)   Centrales generadoras de energía mayores a...

------------ ERROR ---------
Observation: Error: The model response did not complete successfully.
Finish reason: 2.
Finish message: .
Safety ratings: [].
To protect the integrity of the chat session, the request and response were not added to chat history.
To skip the response validation, specify `model.start_chat(response_validation=False)`.
Note that letting blocked or otherwise incomplete responses into chat history might lead to future interactions being blocked by the service.
------------ --- ---------
...
I
c
L
13 comments
I really don't know how to bypass this validation error, I've tried a lot of stuff
I setup my document to have a context window of the maximum that gemini accepts, and even in that case it doesn't work. I am trying with a document with 250k characters, and it throws the error with finish reason 2. Vertex AI is so frustating because there is almost no help in forums

Plain Text
Settings.embed_model = VertexTextEmbedding("textembedding-gecko-multilingual@001",
                                                 credentials=credentials)

Settings.llm = Vertex(
    model="gemini-2.0-flash",
    temperature=0,
    project=credentials.project_id,
    credentials=credentials,
    max_tokens = 8192,
    context_window=1048576
)
To reproduce my error I tried with this doc
Plain Text
from llama_index.core import SummaryIndex
from llama_index.embeddings.vertex import VertexTextEmbedding
from llama_index.llms.vertex import Vertex
from llama_index.core import (
    Settings)
import pandas as pd
from google.oauth2 import service_account
from llama_index.readers.file import PyMuPDFReader
from llama_index.core import Document

filename = "/path/to/your/file"
credentials: service_account.Credentials = (
    service_account.Credentials.from_service_account_file(filename)
)

Settings.embed_model = VertexTextEmbedding("textembedding-gecko-multilingual@001",
                                                 credentials=credentials)

Settings.llm = Vertex(
    model="gemini-2.0-flash",
    temperature=0,
    project=credentials.project_id,
    credentials=credentials,
    max_tokens = 8192,
    context_window=1048576
)

loader = PyMuPDFReader()
pdf_path = '1.dp_inmobiliarios.pdf'
extra_info = {"document_name": pdf_path}
splitted_text = loader.load(file_path=pdf_path, metadata=True, extra_info=extra_info)
full_text = "\n\n".join([d.get_content() for d in splitted_text])
single_document = Document(text=full_text)

summary_index = SummaryIndex.from_documents([single_document])
query_engine = summary_index.as_query_engine(
        llm=Settings.llm,
        response_mode="tree_summarize",
)
Plain Text
response = query_engine.query("""
        Entrega un resumen altamente específico y detallado de todos los elementos claves de este documento,
         indicando los puntos claves más importantes, y también generalidades suficientes para poder entregar el texto retornado a un agente con herramientas para saber redigirirlo a este documento si corresponde dada una query.
         Que no se te salte NINGÚN DETALLE IMPORTANTE, y abarcando todo el documento.
                                    
        IMPORTANTE:
        - Este resumen debe ser dirigido para el área INMOBILIARIA, y mencionar TODOS LOS ELEMENTOS. Si existen elementos de otros áreas, ignoralos, ya que para este contexto no sirve en absoluto.
        - Ocupa la cantidad máxima de tokens si es posible, tiene que ser un resumen largo de 8192 tokens que es lo máximo permitido.
        - Nunca menciones "entre otros". Incluye todos los elementos enumerados que se mencionen en el documento.
                                      
        Por ejemplo, si se habla de elementos acerca de la construcción de un edificio, menciona todos los elementos de construcción, pero no menciones elementos de diseño, ya que no es relevante para el área inmobiliaria.
        Otro ejemplo, menciona los estacionamientos si se menciona, ya que son elementos que son de interés para el área inmobiliaria. Así, para todos los documentos.                            
        """)
I would recommend pinging @Logan M he’s a genius with this stuff
Can you experiment with an Open AI model and see if you get a different result?
Yeah the issue I have with Open AI models are rate limits, because I have a tier 1 account by this moment. I thought if I used Vertex AI LLM, I wouldn't have this kind of issue, as their limits are way higher. But it seems their Gemini models often return these kind of issues I experienced (RECITATION, code error 4: https://github.com/google/generative-ai-docs/issues/257), but I couldn't find any similar to my issue from MAX TOKENS, code error 2
and I think it is weird, because I tried to split the document, because at first I tried to send the whole document in a single call, where it throws the error, to a node with only 1024 token size and I still have this MAX TOKENS error when I tried to run a script that does the same with roughly 20 documents
It is kinda frustrating because I don't have any idea of why it is happening lmao
@Icksir use the newer google-genai sdk, I think it should be slightly more stable? https://docs.llamaindex.ai/en/latest/examples/llm/google_genai/

Although it seems like your text is getting flagged by googles safety filters
I'll try that sdk, and also I thought it was because of safety filters, but I only received error 2 MAX_TOKENS, and 4 RECITACION as indicated in their source code: https://github.com/googleapis/python-aiplatform/blob/b91edf52e2b993c3301a419ad89b473c31c60cc3/google/cloud/aiplatform_v1/types/content.py#L535, so I am not sure if safety filters are the case
Add a reply
Sign up and join the conversation on Discord