Find answers from the community

Updated 6 months ago

Custom llms

At a glance

The community members discuss how to load and use GPT4All with the GPTVectorStoreIndex from the llama_index library. They suggest wrapping the GPT4All model with an LLMPredictor and setting it in the ServiceContext. However, they note that custom LLMs may require adjusting prompt helper settings to account for different max_input_sizes, and that custom LLMs have not had great quality so far.

The community members provide sample code for loading GPT4All and setting up the ServiceContext. They also mention that the GPTVectorStoreIndex is specifically for OpenAI, and suggest using local Hugging Face embeddings instead to avoid the OpenAI RetryError.

The community members confirm that this approach of using local Hugging Face embeddings works, and one member notes that they now need to focus on improving performance.

Useful resources
(For more context, I have figured out how to load GPT4All using from langchain.llms import GPT4All and chat with it directly, but it seems index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context) is specifically for OpenAI?)
1
L
c
D
12 comments
I thiiiink you can load the model from langchain, and then wrap it with the llm predictor

LLMPredictor(llm=<langchain llm>)

Then, you can set the llm_predictor in the service context

However, be wary that other models need to have adjust prompt helper settings to account for different max_input_sizes

Plus, custom llms so far have not had great quality 😅

You can also implement any LLM using a custom LLM class

https://github.com/autratec?tab=repositories
from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, GPTKeywordTableIndex, LLMPredictor, ServiceContext, StorageContext, load_index_from_storage

from langchain.llms import GPT4All

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

callbacks = [StreamingStdOutCallbackHandler()]

llm = GPT4All(model="models/ggml-gpt4all-l13b-snoozy.bin", callbacks=callbacks, verbose=True)

llm_predictor = LLMPredictor(llm=llm)

service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
Sorry for that horrendous formatting
And then:
Plain Text
documents = SimpleDirectoryReader('data', recursive = True).load_data()

index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)

index.storage_context.persist()
It's giving an openai RetryError...
Right, you still need openai for the embeddings model

You can set the embedding model to run locally from hughingface

https://gpt-index.readthedocs.io/en/stable/how_to/customization/embeddings.html#custom-embeddings
Ah OK. Really wish it weren't defaulting to OpenAI but I'll give this a shot. Thanks!
you can still use huggingface local embeddings btw
This worked—thank you!
Now to work on performance 🙂
Add a reply
Sign up and join the conversation on Discord