Find answers from the community

Updated 2 months ago

Hi, I'm having an issue with "

Hi, I'm having an issue with "OutOfMemoryError: CUDA out of memory." error. I have rtx 3060 ti 8 GB gpu but still cant execute this model. I tried on kaggle with 16 GB gpu but still got the same error.

Here is my code for simple llm execute pipeline from LlamaIndex:

# Load the your data from llama_index.core import SimpleDirectoryReader, SummaryIndex documents = SimpleDirectoryReader("/kaggle/input/dataset").load_data() index = SummaryIndex.from_documents(documents) from llama_index.core import PromptTemplate # Transform a string into input zephyr-specific input def completion_to_prompt(completion): ... # Transform a list of chat messages into zephyr-specific input def messages_to_prompt(messages): ... import torch from llama_index.llms.huggingface import HuggingFaceLLM from llama_index.core import Settings Settings.llm = HuggingFaceLLM( model_name="StabilityAI/stablelm-tuned-alpha-3b", tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b", context_window=3900, max_new_tokens=256, generate_kwargs={"temperature": 0.7, "top_k": 50, "top_p": 0.95}, messages_to_prompt=messages_to_prompt, completion_to_prompt=completion_to_prompt, device_map="auto", ) # define embed model Settings.embed_model = "local:BAAI/bge-base-en-v1.5" # Query and print response query_engine = index.as_query_engine() response = query_engine.query("What did the author do after his time at Y Combinator?") print(response)
W
M
L
6 comments
This particular model has weights equalling to around 14+ GB maybe thats why you are getting out of memory error
Attachment
image.png
Is there any solution so I can use this model?Changing lines in the code maybe? And i didnt get it. I used 16 gb gpu in kaggle but still cant executing.
You can try quantization of this llm model that way I think you can run this llm without cuda error for memory
I would just use ollama if you want to run locally, much more optimized
How can i do that?
Did you have the same problem?
Add a reply
Sign up and join the conversation on Discord