Hi everyone, I am playing around with

aali_l0355

Hi everyone, I am playing around with the Claude model using Bedrock in Llamaindex like this command: Bedrock(model="anthropic.claude-v2"). When I check the generated prompt using token_counter.llm_token_counts , I notice that there are two duplicate prompt events. Any one has any idea why we send two back to back same prompt calls to LLM?

26 comments

LLogan M

Are you sure they are duplicates?

LLogan M

sounds like a bug in the token counter maybe?

aali_l0355

yes the same exact

aali_l0355

is it a bug in the token counter or the way we generate prompt? because I also track prompts, it seems like we send two calls

LLogan M

What does your setup/query look like?

LLogan M

If it is actually duplicating calls, I should be able to replicate using your setup pretty easily

aali_l0355

Plain Text

class QA():
    def __init__(self):
        self.CHUNK_SIZE = 256
        self.MODEL_NAME = "gpt-3.5-turbo"
        self.NUM_OF_PRODUCT_REQUIRED_PAGES = 5
        self.SIMILARITY_TOP_K = 2

    def get_model(self):
        token_counter = TokenCountingHandler(
            tokenizer=tiktoken.encoding_for_model(self.MODEL_NAME).encode
        )
        callback_manager = CallbackManager([token_counter])
        llm_predictor =  Bedrock(model="anthropic.claude-v2", profile_name='ABC'
)

        # llm_predictor = OpenAI(temperature=0, model=self.MODEL_NAME)
        # service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size=self.CHUNK_SIZE)
        service_context = ServiceContext.from_defaults(llm=llm_predictor, chunk_size=self.CHUNK_SIZE, callback_manager=callback_manager)
        set_global_service_context(service_context)
        return service_context, token_counter

LLogan M

and then you just used that in a vector index query engine I suppose?

aali_l0355

@Logan M yes that is correct

aali_l0355

is it a bug you think?

aali_l0355

I track calls to LLMs, there are two exact same prompts

LLogan M

How did you setup the query engine? Just index.as_query_engine()?

I'd love to reproduce this but so far, I cannot

Plain Text

from llama_index.callbacks import TokenCountingHandler, CallbackManager
from llama_index import ServiceContext, SimpleDirectoryReader, VectorStoreIndex

documents = SimpleDirectoryReader("./docs/examples/data/paul_graham").load_data()

service_context = ServiceContext.from_defaults(
    callback_manager=CallbackManager([TokenCountingHandler()]), chunk_size=256
)

index = VectorStoreIndex.from_documents(documents, service_context=service_context)

query_engine = index.as_query_engine(similarity_top_k=2)

response = query_engine.query("What is the best way to raise money for a startup?")

LLogan M

This only calls the LLM once

aali_l0355

@Logan M tried again this code

aali_l0355

Plain Text

token_counter = TokenCountingHandler(
    tokenizer=tiktoken.encoding_for_model( "gpt-3.5-turbo").encode
)
callback_manager = CallbackManager([token_counter])
llm_predictor = Bedrock(model="anthropic.claude-v2")

service_context = ServiceContext.from_defaults(llm=llm_predictor, chunk_size=256, callback_manager=callback_manager)
set_global_service_context(service_context)

pages_text = ['Investors write checks when the idea they hear is compelling, when they are persuaded that the team of founders can realize its vision, and that the opportunity described is real and sufficiently large. When founders are ready to tell this story, they can raise money. And usually when you can raise money, you should.']
documents = [Document(text=t) for t in pages_text]

custom_llm_index = VectorStoreIndex.from_documents(documents, service_context=service_context)
custom_llm_query_engine = custom_llm_index.as_query_engine(similarity_top_k=2)
question = "how to raise money for a startup"
response = custom_llm_query_engine.query(question)
token_counter.llm_token_counts

aali_l0355

i see two calls to the LLM

aali_l0355

this is my output:

aali_l0355

a list of two exact same events

LLogan M

wow this one is a doozy

Stepped through with a debugger, add a ton of print statements

I can confirm that the API is only called once, but the actual event is getting logged twice (for reasons that are very complex)

LLogan M

I put a print statement just before an API calls in the Bedrock LLM class, it only gets hit once

aali_l0355

I am using Arize AI and it shows two calls

LLogan M

yes because the callback is getting called twice

LLogan M

(hence the two token counting events)

LLogan M

If you debug further, into the actual code, or enable some debug logs for the bedrock client (if it has any), you'd see only one network request is being made

LLogan M

I wrote our entire callback system, I can state this with 100% confidence

LLogan M

the callback is just being called twice, for a pretty complex reason that I guess I need to fix at some point

Add a reply

Find answers from the community

Hi everyone, I am playing around with