----------------------
LlamaIndex does support Xorbits Inference (Xinference) for deploying local LLMs. However, the information extracts do not provide specific details about CPU support or requirements.
To use Xinference with LlamaIndex, you need to install Xinference using pip and then launch the local models. Here is a brief guide on how to do it:
- Install Xinference:
pip install "xinference[all]"
- After installation, restart your Jupyter notebook.
- Run
xinference
in a new terminal window. You should see an output similar to this:
INFO:xinference:Xinference successfully started. Endpoint: <http://127.0.0.1:9997>
INFO:xinference.core.service:Worker 127.0.0.1:21561 has been added successfully
INFO:xinference.deploy.worker:Xinference worker successfully started.
- Locate the endpoint port number in the endpoint description. In the example above, it is
9997
. - Set the port number with the following cell:
%pip install llama-index-llms-xinference
port = 9997 # replace with your endpoint port number
- Launch a model and use it. Here is an example of how to do it:
```python
Define a client to send commands to xinference
client = RESTfulClient(f")
Download and Launch a model, this may take a while the first time
model_uid = client.launch_model(
model_name="llama-2-chat",
model_size_in_billions=7,
model_format="ggmlv3",
quantization="q2_K",
)