Find answers from the community

Home
Members
Mustafa B.
M
Mustafa B.
Offline, last seen 3 months ago
Joined September 25, 2024
Hi, I'm having an issue with "OutOfMemoryError: CUDA out of memory." error. I have rtx 3060 ti 8 GB gpu but still cant execute this model. I tried on kaggle with 16 GB gpu but still got the same error.

Here is my code for simple llm execute pipeline from LlamaIndex:

# Load the your data from llama_index.core import SimpleDirectoryReader, SummaryIndex documents = SimpleDirectoryReader("/kaggle/input/dataset").load_data() index = SummaryIndex.from_documents(documents) from llama_index.core import PromptTemplate # Transform a string into input zephyr-specific input def completion_to_prompt(completion): ... # Transform a list of chat messages into zephyr-specific input def messages_to_prompt(messages): ... import torch from llama_index.llms.huggingface import HuggingFaceLLM from llama_index.core import Settings Settings.llm = HuggingFaceLLM( model_name="StabilityAI/stablelm-tuned-alpha-3b", tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b", context_window=3900, max_new_tokens=256, generate_kwargs={"temperature": 0.7, "top_k": 50, "top_p": 0.95}, messages_to_prompt=messages_to_prompt, completion_to_prompt=completion_to_prompt, device_map="auto", ) # define embed model Settings.embed_model = "local:BAAI/bge-base-en-v1.5" # Query and print response query_engine = index.as_query_engine() response = query_engine.query("What did the author do after his time at Y Combinator?") print(response)
6 comments
M
L
W