Hi! We've been rolling out to production and I recently implemented a memory profiling quick code to track down some resource usage issues we were experiencing. I've identified what appears to be a memory leak related to
llama_index/core/instrumentation/dispatcher.py
. There's memory growth in other components too but this is the most significant now.
Here's what I'm observing:
- When I run stress tests against my system, memory usage in
dispatcher.py
increases - After the stress test completes and Python's garbage collector runs, some memory is freed, but not all of it
- With each test cycle, the baseline memory consumption keeps growing
- In this way memory slowly but steadily increases until eventually this leads to our containers hitting memory limits and being OOM-killed
The pattern is quite clear in our profiling data - we see the memory allocation for some components growing incrementally over time without ever returning to the previous baseline, even after GC runs (I also forced gc runs but it didn't help either). I'd appreciate any insights on:
- Is this a known issue?
- Are there any configuration settings or best practices for preventing memory accumulation?
Some more info:
- We use OpenAi, not local models.
- We send OTEL traces to a self hosted Arize Phoenix
Thanks for your time and for maintaining such a useful library!