I fine-tuned Llama3.1-8B for the Text2SQL task, and now I have two questions:
1) How can I load the locally saved finetuned model in LlamaIndex? 2) How can I use quantization to load the model on my GPU?
I tried pushing the model to HuggingFace and downloading it using the HuggingFace() LLM class in LlamaIndex (such as the other LLMs); however, the model is not loaded on the GPU.