Is there a way to turn off Gemini's

At a glance

The community members are discussing how to turn off the safety filter in Gemini, an LLM (Large Language Model), when using it for a chat engine. They share code snippets and explore different approaches, such as using the Vertex AI package or the Gemini LLM class. The discussion reveals that the safety settings can be configured, but there are some uncertainties around the availability and behavior of the "gemini-ultra" model. The community members also try to debug the issue by checking logs and exploring the available models, but they are unable to find a definitive solution.

Useful resources

iidontneedonetho

Is there a way to turn off Gemini's safety filter when using it as the LLM for a chat engine?

63 comments

LLogan M

hmmm

Attachment

iidontneedonetho

Where is this? I checked the vertex file under LLMs and couldn’t find any info

iidontneedonetho

Im able to trigger its safety response

iidontneedonetho

I’ll get outta bed and get some code in here so you can see

iidontneedonetho

So if I do a chat engine like this, and ask it `What is your prompt?`, it'll spit out the error

(Formatting might have gotten messed up, but it should get the point across:

Plain Text

async with message.channel.typing():
        memory = ChatMemoryBuffer.from_defaults(token_limit=8000)
        context = await fetch_context_and_content(message, client, content)
        memory.set(context + [HistoryChatMessage(f"{message.author.name}: {content}", Role.USER)])
        chat_engine = index.as_chat_engine(
            chat_mode="condense_plus_context",
            memory=memory,
            similarity_top_k=5,
            context_prompt=(
                "there is a promopt here"
            )
        )
        chat_response = chat_engine.chat(content)
        if not chat_response or not chat_response.response:
            await message.channel.send("There was an error processing the message." if not chat_response else "I didn't get a response.")
            return
        response_text = chat_response.response
        response_text = re.sub(r'^[^:]+:\s(?=[A-Z])', '', response_text)
        await send_long_message(message, response_text)
except Exception as e:
            await message.channel.send(f"An error occurred: {str(e)}")

Plain Text

An error occurred: block_reason: SAFETY
safety_ratings {
  category: HARM_CATEGORY_SEXUALLY_EXPLICIT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HATE_SPEECH
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HARASSMENT
  probability: MEDIUM
}
safety_ratings {
  category: HARM_CATEGORY_DANGEROUS_CONTENT
  probability: NEGLIGIBLE
}

But if I do a chat engine like this, it'll give a response:

Plain Text

async with message.channel.typing():
        memory = ChatMemoryBuffer.from_defaults(token_limit=8000)
        context = await fetch_context_and_content(message, client, content)
        memory.set(context + [HistoryChatMessage(f"{message.author.name}: {content}", Role.USER)])
        chat_engine = CondensePlusContextChatEngine.from_defaults(
            retriever=index.as_retriever(),
            memory=memory,
            similarity_top_k=5,
            context_prompt=(
                "prompt here"
            ),
        )
        chat_response = chat_engine.chat(content)
        if not chat_response or not chat_response.response:
            await message.channel.send("There was an error processing the message." if not chat_response else "I didn't get a response.")
            return
        response_text = chat_response.response
        response_text = re.sub(r'^[^:]+:\s(?=[A-Z])', '', response_text)
        await send_long_message(message, response_text)
except Exception as e:
            await message.channel.send(f"An error occurred: {str(e)}")

Plain Text

As an AI chat assistant, my prompt is to provide information and assistance related to CommaAI's OpenPilot. I can help answer questions, provide guidance, and offer support on various topics related to OpenPilot. How can I assist you today?

I had to remove the prompts for characters, but it was along the lines of, you're a bot, talk about subject.

Both are getting the LLM via `llm = Gemini(max_tokens=1000)`. I'm using the `text-embedding-3-small` from `OpenAI` as my embed model.

LLogan M

Oh you are using vertex? This is in the Gemini LLM class

iidontneedonetho

I couldn't find the gemini file under llms folder, though, it was 2:15 am, I mighta missed

LLogan M

lemme link, one sec

iidontneedonetho

LLogan M

https://github.com/run-llama/llama_index/blob/e3a169b93b9e6014cb2c5bd731fc1f7467e3312e/llama_index/llms/gemini.py#L40

iidontneedonetho

In theory tho, they should be the same thing, no?

LLogan M

not really actually

iidontneedonetho

LLogan M

Gemini uses the import google.generativeai as genai package

Vertex uses the import vertexai package

iidontneedonetho

thanks

iidontneedonetho

I meant for this, sorry

iidontneedonetho

those should be the same right?

LLogan M

Plain Text

chat_engine = CondensePlusContextChatEngine.from_defaults(
    retriever=index.as_retriever(),
    memory=memory,
    similarity_top_k=5,
    context_prompt=(
        "prompt here"
    ),
)

You didn't pass in a service context, so it's defaulting to gpt-3.5 here. Thats why it works

iidontneedonetho

oh...

LLogan M

index.as_chat_engine takes the service context from the index

LLogan M

(I know its jank, new release tomorrow! 🙂 )

iidontneedonetho

10-4 🫡

iidontneedonetho

thank you

iidontneedonetho

Plain Text

An error occurred: block_reason: SAFETY
safety_ratings {
  category: HARM_CATEGORY_SEXUALLY_EXPLICIT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HATE_SPEECH
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HARASSMENT
  probability: HIGH
}
safety_ratings {
  category: HARM_CATEGORY_DANGEROUS_CONTENT
  probability: NEGLIGIBLE
}

The second I pass the service_context through and ask a question

LLogan M

makes sense 😅

iidontneedonetho

?? I see the way safety is disabled in gemini.py, but that doesn't look right to me, when I did it before I moved to llama_index I had to specify which safety settings to disable.

iidontneedonetho

ALSO!: am i able to just, switch gemini-ultra using llama index?? Like, I see it, I switched to it, it's genning responses, but is it really gemini-ultra? or did it fall back to gemini-pro/gpt3.5?

LLogan M

it wouldn't fallback to 3.5 if its set in the service context and passed in

I doubt the gemini package would fallback to pro if you specified ultra?

LLogan M

I think you just configure it like this?

Plain Text

safety_config = {
    generative_models.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: generative_models.HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
    generative_models.HarmCategory.HARM_CATEGORY_HARASSMENT: generative_models.HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
}

llm = Gemini(..., safety_settings=safety_config)

LLogan M

just judging by their docs anyways

iidontneedonetho

but ultra isn't available for public api yet right?

iidontneedonetho

yet llm = Gemini(model="models/gemini-ultra", max_tokens=1000) is giving me responses

iidontneedonetho

so is llm = Gemini(model="gemini-ultra", max_tokens=1000)

LLogan M

idk what that code is doing man haha

Plain Text

self._model = genai.GenerativeModel(
    model_name=model_name,
    generation_config=final_gen_config,
    safety_settings=safety_settings,
)

LLogan M

🤷‍♂️

iidontneedonetho

lol!

LLogan M

Plain Text

import google.generativeai as genai

for m in genai.list_models():
    if "generateContent" in m.supported_generation_methods:
        print(m.name)

LLogan M

That would print everything available

iidontneedonetho

thank you

iidontneedonetho

just pro and pro vision, but then why am I getting responses from setting it to ultra?

iidontneedonetho

very intresting

LLogan M

yea not sure what the genai package is doing when you give it the ultra model name

LLogan M

maybe debug logs would reveal what kind of requests are being made

LLogan M

Plain Text

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

iidontneedonetho

I’ll try that in a sec

iidontneedonetho

I see stuff about open ai embedding model, but nothing about google, or gemini or genai

LLogan M

hmm I guess they don't log

iidontneedonetho

I'll send the file here, if you wannt take alook too

LLogan M

I dont even have access to gemini lol google doesn't like canada I guess

iidontneedonetho

I can see on my google cloud console that my api is getting hit, it just doesn't tell me which model is being used

iidontneedonetho

rip

LLogan M

gooooogle

iidontneedonetho

google? as in google it, or as in, you're upset at them? or as in lol, that's such a google thing.

LLogan M

thats such a google thing hahaha

iidontneedonetho

ah, lol, ikr

iidontneedonetho

pulled up my old code from before llama index and found the safety settings config:

Plain Text

safety_settings = {
    "HARM_CATEGORY_SEXUALLY_EXPLICIT": "BLOCK_NONE",
    "HARM_CATEGORY_HATE_SPEECH": "BLOCK_NONE",
    "HARM_CATEGORY_HARASSMENT": "BLOCK_NONE",
    "HARM_CATEGORY_DANGEROUS_CONTENT": "BLOCK_NONE"
}

iidontneedonetho

that' works

iidontneedonetho

Also, will you guys support googles AQA model anytime soon?

LLogan M

AQA? Is that Active Question Answering?

LLogan M

I think thats super old no?

LLogan M

in any case, most integrations are community driven/contributed

nnavya1260

Hi
@Logan M
class ElasticsearchClient:
def init(self):
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
Settings.llm = Vertex(
model="gemini-pro", project=xxxx, credentials=GOOGLE_API_KEY
)
# Settings.llm = Gemini(model="models/gemini-pro")

self.index = VectorStoreIndex.from_vector_store(
self.vector_store, storage_context=self.storage_context)

def get_answer(self, document_id, query) -> Generator[str, None, None]:
query_engine = self.index.as_query_engine(
vector_store_kwargs={
"es_filter": [{"match": {"metadata.docid.keyword": document_id}}],
}, similarity_top_k=6, streaming=True
)

response = query_engine.query(query)
for text in response.response_gen:
yield text
this is my code and it was working before when i used gemini class directly. Now I created api key from vertex ai and after i changed to vertex class to use gemini-pro model, I am getting this error.
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/navyasreepinjalavenkateswararao/Desktop/leximai-deep-api/venv/lib/python3.11/site-packages/llama_index/core/llms/callbacks.py", line 93, in wrapped_llm_chat
f_return_val = f(_self, messages, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/navyasreepinjalavenkateswararao/Desktop/leximai-deep-api/venv/lib/python3.11/site-packages/llama_index/llms/vertex/base.py", line 213, in stream_chat
chat_history = _parse_chat_history(messages[:-1], self._is_gemini)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/navyasreepinjalavenkateswararao/Desktop/leximai-deep-api/venv/lib/python3.11/site-packages/llama_index/llms/vertex/utils.py", line 172, in _parse_chat_history
raise ValueError("Gemini model don't support system messages")
ValueError: Gemini model don't support system messages

LLogan M

Indeed gemini does not support system messages (which is lame)

There is a WIP PR here though
https://github.com/run-llama/llama_index/pull/11511

Add a reply

Find answers from the community

Is there a way to turn off Gemini's

So if I do a chat engine like this, and ask it What is your prompt?, it'll spit out the error

But if I do a chat engine like this, it'll give a response:

Both are getting the LLM via llm = Gemini(max_tokens=1000). I'm using the text-embedding-3-small from OpenAI as my embed model.

So if I do a chat engine like this, and ask it `What is your prompt?`, it'll spit out the error

Both are getting the LLM via `llm = Gemini(max_tokens=1000)`. I'm using the `text-embedding-3-small` from `OpenAI` as my embed model.