If you run a server locally, and that server is configured to run a model behind an endpoint, all you likely need is the endpoint and credentials:
# ollama, for example
OLLAMA_LLM_MAX_TOKENS=20_000
OLLAMA_MODEL=llama3.1
OLLAMA_BASE_URL=http://localhost:11434
# use
from llama_index.llms.ollama import Ollama
llm = Ollama(**config)
# or LMStudio
LMSTUDIO_API_BASE=http://localhost:1234/v1
LMSTUDIO_API_KEY=lm-studio
# use
from llama_index.llms.openai import OpenAI
llm = OpenAI(**config)
Just convert the options and pack the
config
as needed.