Find answers from the community

Updated 6 months ago

```program = GuidancePydanticProgram( output_cls=Character, prompt_template_str=(

At a glance

The community member is running a GuidancePydanticProgram using a large language model (mistralai/Mistral-7B-v0.1) on a GPU with 32GB VRAM, but is experiencing an out of memory error. Another community member suggests general strategies to optimize memory usage, such as reducing the batch size, truncating the input or output, or using a smaller model. However, they note that the information provided does not contain specific details on how to optimize the program. The second community member also questions whether a 32GB VRAM GPU should be running out of memory for a 16GB model, and requests a screenshot of the VRAM and RAM usage when the error occurs.

Useful resources
Plain Text
program = GuidancePydanticProgram(
    output_cls=Character,
    prompt_template_str=(
        "Give me a character description"
        " the movie {{game_name}} as inspiration"
    ),
    guidance_llm=outlines.models.transformers("mistralai/Mistral-7B-v0.1", device="cuda"),
    verbose=True,
)
and getting out of memory error running on 32 gb VRAM 3090? ann thoughts on how to optimize it?
k
V
3 comments
Please stand by @andysingal as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
Hmm, I don't know enough to give you a confident answer yet. The information extracts provided do not contain specific details on how to optimize the GuidancePydanticProgram to avoid out of memory errors on a GPU with 32GB VRAM.

However, it's worth noting that large language models like the one you're using (mistralai/Mistral-7B-v0.1) can consume a significant amount of memory, especially when processing large inputs or outputs.

You might want to consider the following general strategies for optimizing memory usage:

  1. Reduce the batch size: If you're processing multiple inputs at once, try reducing the number of inputs you process at the same time.
  1. Truncate the input or output: If your inputs or outputs are very long, you could try reducing their length. However, be aware that this could affect the quality of the model's output.
  1. Use a smaller model: If memory usage continues to be an issue, you might need to switch to a smaller model. This could also affect output quality, but it might be necessary if memory constraints are a significant issue.
For more specific guidance, you might find it helpful to refer to the LlamaIndex documentation () or the LlamaIndex GitHub repository ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
32 GB going out for a 16 GB model doesn't seem right. Can you paste a screenshot of your VRAM + RAM usage graph when the error happens, like Nicholas did in their message?
https://discord.com/channels/1059199217496772688/1059200010622873741/1190678757707632731
Add a reply
Sign up and join the conversation on Discord