Find answers from the community

Updated 3 months ago

Monitoring Runtime Context and Memory Usage

At a glance

The community member is asking how to check the current available context size, memory used, and max tokens used at runtime, so they can reset the variables and chat engine before reaching the limit and encountering an error. The comments suggest using the llama_index library to check the total available context length for the OpenAI model, and adding an instrumentation module to track the remaining tokens.

Useful resources

PPragyan Mohapatra

@Logan M can I get to know current available context size and memory used and max tokens used at runtime so that when it approaches the limit, I can reset the variables and the chat engine so that it doesn’t reach the limit and break with error

1 comment

WWhiteFang_Jr

You can check the total available context length for openAI model here: https://github.com/run-llama/llama_index/blob/aa1f5776787b8b435f89d2c261fd7ca8002c1f19/llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/utils.py#L39

For chekcing what is the token remaining you can add instrumentation module: https://docs.llamaindex.ai/en/stable/examples/instrumentation/instrumentation_observability_rundown/

take the LLM event and extract tokens and than update the final token lenght based on the new values.

Add a reply