Find answers from the community

Updated 2 months ago

hey how can I use my local url , which is llamacpp python server hosted locally in 8000 po

hey how can I use my local url , which is llamacpp python server hosted locally in 8000 port in llamaindex for llm inference
k
T
5 comments
Please stand by @Tech explorer as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
To use a local LLM with LlamaIndex, you can follow the example provided in the LlamaIndex documentation () for using a local model. In your case, if you have a local server running at port 8000, you would replace the example's Ollama model with your own local model.

Here's an example of how you might do this:

Plain Text
from llama_index.llms.ollama import Ollama
from llama_index.core import Settings

Settings.llm = Ollama(model="<http://localhost:8000",> request_timeout=60.0)


In this example, replace "Ollama" with the name of your local model, and replace " with the URL of your local server. The request_timeout parameter is optional and can be adjusted based on your needs.

Please note that this is a general guide and the exact code may vary depending on the specifics of your local model and server. If your local model has a specific LLM wrapper in LlamaIndex, you should use that instead of the Ollama example.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai Iam running llamacpp python server not ollama . How to use my llamacpp python server openai compatible server
Let me check the knowledge base for you. One moment! :books:
----------------------
Based on the information provided in the extracts, there isn't a specific example of using a local LlamaCPP server with LlamaIndex. However, you can generally use a local language model (LLM) with LlamaIndex as shown in the LlamaIndex documentation ().

Here's an example of how you might do this:

Plain Text
from llama_index.llms import CustomLLM
from llama_index.core import Settings

Settings.llm = CustomLLM(model="<http://localhost:8000",> request_timeout=60.0)


In this example, replace "CustomLLM" with the appropriate class for your LlamaCPP server if it exists. If not, you may need to create a custom LLM class that can interact with your LlamaCPP server. The request_timeout parameter is optional and can be adjusted based on your needs.

Please note that this is a general guide and the exact code may vary depending on the specifics of your LlamaCPP server and how it's set up to interact with LlamaIndex. If your LlamaCPP server has a specific LLM wrapper in LlamaIndex, you should use that instead of the CustomLLM example.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord