Hello everyone,
I recently started working with llama-index to enhance my local knowledge base, and I've found it to be a fantastic tool for my needs. However, I've encountered a memory issue that I'm struggling to solve.
I attempted to use the GritLM/GritLM-7B model, which, to my understanding, should occupy approximately 28GB of memory. Given that I'm using an A40 GPU, I anticipated that this would be sufficient. Surprisingly, when I run the model the actual memory consumption seems to exceed 100GB ( when I switched to
device=cpu
)
I'm puzzled about where the problem lies. Below is my code snippet:
import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.core.node_parser import SentenceSplitter,MarkdownNodeParser
from llama_index.extractors.entity import EntityExtractor
from llama_index.core import Settings
import tiktoken
from llama_index.core.extractors import TitleExtractor
llm = AzureOpenAI(
model="gpt-35-turbo",
deployment_name='xxx',
api_key="xxx",
azure_endpoint="https://xxx.openai.azure.com/",
api_version="2023-07-01-preview",
)
embed_model=HuggingFaceEmbedding(model_name="GritLM/GritLM-7B",cache_folder='/home/username/model_cache',device="cuda",)
Settings.llm = llm
Settings.transformations = [SentenceSplitter(chunk_size=4096,paragraph_separator = '\n\n')]
Settings.tokenizer = tiktoken.encoding_for_model("gpt-3.5-turbo").encode
Settings.embed_model = embed_model
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents, show_progress=True)
I'm reaching out to see if anyone has experienced similar issues or if there's something I'm overlooking in my setup. Any insights or suggestions on how to address this memory discrepancy would be greatly appreciated.
Thank you in advance for your help!