The community member is trying to benchmark the performance of open-source models on a custom dataset using the HuggingFaceLLM method in llamaindex. They want to test the models sequentially and are asking how to remove a particular model from the GPU before moving the next model to the GPU. A community member suggests using del model and torch.cuda.empty_cache(), but the original poster says they tried it before and it didn't work. There is no explicitly marked answer.
Hello, I am trying to benchmark performance of some open source models on a custom datasets using HuggingFaceLLM method in llamaindex. I want to test the models sequentially. How do I remove a particular model from the GPU before moving the next model to the GPU?