Find answers from the community

Updated 3 months ago

Hi Guys, I want to use

Hi Guys, I want to use NLSQLTableQueryEngine and SQLTableRetrieverQueryEngine with local offline fine tune llm model downloaded on my system. Please suggest what is the exact implementation of the same
W
M
12 comments
Adding llm to the Settings global object is the only requirement I think.

Plain Text
#define your LLM
llm = LLM object

# Add it to Settings
from llama_index.core import Settings
Settings.llm=llm


This should get you going using your llm
Hi @WhiteFang_Jr : Thanks for the quick response. Can you also help how to create a llm object when I have a local fine tuned llm model on my pc. Any example would work
How do you run it locally?
Is it hosted on your local server and you interact with it using API?
i have downloaded the weights on my local machine and fine tuned on my custom data and merged with original model weights. So basically the model is on local machine in a folder
No I mean how do you interact with it? You use python server like fastAPI or something else?
I just load it using transformers like below
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
# torch_dtype=torch.bfloat16,
# load_in_8bit=True,
# load_in_4bit=True,
device_map="auto",
use_cache=True,
offload_folder="offload",
torch_dtype=torch.float16,
quantization_config=quantization_config
)
and in place of model name , I give the directory of the model
Ah okay, So you'll have to put in a server for it to be available otherwise for each query loading the LLM will take a lot of time.

There are two ways:
1: Your current way: You can use CustomLLM abstraction https://docs.llamaindex.ai/en/stable/module_guides/models/llms/usage_custom.html#example-using-a-custom-llm-model-advanced
and in complete method load the model as you are doing right now and return the response in the given format.


2: Deploy your model on local server using fastAPI, just launch the server and keep the model active at endpoint v1/generate and use OpenAILike
https://docs.llamaindex.ai/en/stable/examples/llm/localai.html#llamaindex-interaction

  • Install the required pypi package: pip install llama-index-llms-openai-like
  • These LocalAIdefaults are
Plain Text
    LOCALAI_DEFAULTS={
        "api_key": "localai_fake",
        "api_type": "localai_fake",
        "api_base": f"http://localhost:8000/v1/generate",
    }


from llama_index.core.llms import ChatMessage
from llama_index.llms.openai_like import OpenAILike
from llama_index.core import Settings

MAC_M1_LUNADEMO_CONSERVATIVE_TIMEOUT = 10 * 60  # sec

llm = OpenAILike(
    **LOCALAI_DEFAULTS,
    model="lunademo",
    is_chat_model=True,
    timeout=MAC_M1_LUNADEMO_CONSERVATIVE_TIMEOUT,
)
Setttings.llm = llm
ok Thanks, I will try that and will let you know if I will face any issue
Add a reply
Sign up and join the conversation on Discord