Mustafa B.

Hi, I'm having an issue with "OutOfMemoryError: CUDA out of memory." error. I have rtx 3060 ti 8 GB gpu but still cant execute this model. I tried on kaggle with 16 GB gpu but still got the same error.

Here is my code for simple llm execute pipeline from LlamaIndex:

# Load the your data
from llama_index.core import SimpleDirectoryReader, SummaryIndex

documents = SimpleDirectoryReader("/kaggle/input/dataset").load_data()
index = SummaryIndex.from_documents(documents)

from llama_index.core import PromptTemplate


# Transform a string into input zephyr-specific input
def completion_to_prompt(completion):
    ...


# Transform a list of chat messages into zephyr-specific input
def messages_to_prompt(messages):
   ...

import torch
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core import Settings

Settings.llm = HuggingFaceLLM(
    model_name="StabilityAI/stablelm-tuned-alpha-3b",
    tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b",
    context_window=3900,
    max_new_tokens=256,
    generate_kwargs={"temperature": 0.7, "top_k": 50, "top_p": 0.95},
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    device_map="auto",
)
# define embed model
Settings.embed_model = "local:BAAI/bge-base-en-v1.5"

# Query and print response
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do after his time at Y Combinator?")
print(response)

Find answers from the community

Hi, I'm having an issue with "