Hi Guys, I want to use

MMitesh Garg

Hi Guys, I want to use NLSQLTableQueryEngine and SQLTableRetrieverQueryEngine with local offline fine tune llm model downloaded on my system. Please suggest what is the exact implementation of the same

12 comments

WWhiteFang_Jr

Adding llm to the Settings global object is the only requirement I think.

Plain Text

#define your LLM
llm = LLM object

# Add it to Settings
from llama_index.core import Settings
Settings.llm=llm

This should get you going using your llm

MMitesh Garg

Hi @WhiteFang_Jr : Thanks for the quick response. Can you also help how to create a llm object when I have a local fine tuned llm model on my pc. Any example would work

WWhiteFang_Jr

How do you run it locally?
Is it hosted on your local server and you interact with it using API?

MMitesh Garg

i have downloaded the weights on my local machine and fine tuned on my custom data and merged with original model weights. So basically the model is on local machine in a folder

WWhiteFang_Jr

No I mean how do you interact with it? You use python server like fastAPI or something else?

MMitesh Garg

I just load it using transformers like below

MMitesh Garg

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

MMitesh Garg

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
# torch_dtype=torch.bfloat16,
# load_in_8bit=True,
# load_in_4bit=True,
device_map="auto",
use_cache=True,
offload_folder="offload",
torch_dtype=torch.float16,
quantization_config=quantization_config
)

MMitesh Garg

and in place of model name , I give the directory of the model

WWhiteFang_Jr

Ah okay, So you'll have to put in a server for it to be available otherwise for each query loading the LLM will take a lot of time.

There are two ways:
1: Your current way: You can use CustomLLM abstraction https://docs.llamaindex.ai/en/stable/module_guides/models/llms/usage_custom.html#example-using-a-custom-llm-model-advanced
and in complete method load the model as you are doing right now and return the response in the given format.

2: Deploy your model on local server using fastAPI, just launch the server and keep the model active at endpoint v1/generate and use OpenAILike
https://docs.llamaindex.ai/en/stable/examples/llm/localai.html#llamaindex-interaction

Install the required pypi package: pip install llama-index-llms-openai-like
These LocalAIdefaults are

Plain Text

    LOCALAI_DEFAULTS={
        "api_key": "localai_fake",
        "api_type": "localai_fake",
        "api_base": f"http://localhost:8000/v1/generate",
    }


from llama_index.core.llms import ChatMessage
from llama_index.llms.openai_like import OpenAILike
from llama_index.core import Settings

MAC_M1_LUNADEMO_CONSERVATIVE_TIMEOUT = 10 * 60  # sec

llm = OpenAILike(
    **LOCALAI_DEFAULTS,
    model="lunademo",
    is_chat_model=True,
    timeout=MAC_M1_LUNADEMO_CONSERVATIVE_TIMEOUT,
)
Setttings.llm = llm

MMitesh Garg

ok Thanks, I will try that and will let you know if I will face any issue

WWhiteFang_Jr

Sure 💪

Add a reply

Find answers from the community

Hi Guys, I want to use