So digging into the docs and source am I

llorentz

So digging into the docs and source, am I correct in that while I can use custom models for queries, I cannot for indexing? The discussion when people bring this up on this always seem to have someone call back to custom models but, it doesn't seem to actually work the way they are suggesting, or I am missing something. Just to be sure. I have no problem using local models for queries.

17 comments

LLogan M

You can use custom models for both. Just have to keep in mind that for a vector index, the embed_model is used during indexing, not the llm_predictor

You can set the embed model to anything from huggingface, as shown here (if you don't pass in a model name, it loads mpnet-v2)
https://gpt-index.readthedocs.io/en/latest/how_to/customization/embeddings.html#custom-embeddings

llorentz

I never looked in embeddings docs because I did not consider them as such. and everyone on so many forums / discussions just pointed back to readthedocs (invariably 404) related to custom models, perhaps the docs changed and segmented, which is why I was confused by the discussions, which should now link to embeddings? I think they were just old posts etc. Since it is rapidly changing documentation with updates.

Thanks, let me look into this.

llorentz

one other question. What if I have a locally stored model, is there a way to just use that instead of grabbing from huggingface?

LLogan M

Not an easy way... You'd have to extend the base emebddings class from llama index 🤔

llorentz

I don't think it would be too hard to extend it as such, really just add a local-dir flag opt and grab from there instead of downloading if present, unless it is doing somethin goofy with it, hacky but would work for the interim. I'll look at the src when I get to needing it .

llorentz

thanks

llorentz

So in a similar issue here.

llorentz

import torch
import tensorflow
from langchain.llms.base import LLM
from llama_index import SimpleDirectoryReader,GPTKeywordTableIndex, PromptHelper, GPTVectorStoreIndex, LangchainEmbedding, ServiceContext, GPTListIndex, SimpleDirectoryReader, LLMPredictor, ServiceContext
from transformers import pipeline
from typing import Optional, List, Mapping, Any
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
import pdb

define prompt helper

set maximum input size

max_input_size = 2048

set number of output tokens

num_output = 256

set maximum chunk overlap

max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

class CustomLLM(LLM):
# model_name="gozfarb/mosaicml_mpt-7b-storywriter-apache"
model_name="google/flan-t5-base"
pipeline = pipeline("text2text-generation", model=model_name, device="cuda:0", trust_remote_code=True)

def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
prompt_length = len(prompt)
response = self.pipeline(prompt, max_new_tokens=num_output)[0]["generated_text"]

# only return newly generated tokens
return response[prompt_length:]

@property
def _identifying_params(self) -> Mapping[str, Any]:
return {"name_of_model": self.model_name}

@property
def _llm_type(self) -> str:
return "custom"

define our LLM

llm_predictor = LLMPredictor(llm=CustomLLM())

load in HF embedding model from langchain

embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embed_model)

Load you data into 'Documents' a custom type by LlamaIndex

documents = SimpleDirectoryReader('./data').load_data()
new_index = GPTListIndex.from_documents(documents)

query with embed_model specified

query_engine = new_index.as_query_engine(

llorentz

retriever_mode="embedding",
verbose=True,
service_context=service_context
)

llorentz

This works just fine, no openai call, or complaint for lack of KEY

llorentz

response = query_engine.query("Why did Jupiter want to flood the Earth?")
print(response)

This however, still calls openAI and I cannot figure out the reason for this, the error is not very clear, no indication I can see of why the default is still being fallen back on

llorentz

same issue as before, but, alas, different.

Also, ignore the horrorcode, it's mishmash of multiple cells from a notebook, but it works, until the query

LLogan M

new_index = GPTListIndex.from_documents(documents, service_context=service_context) <- maybe also add the service context here? Little frustrating with these defaults I know lol

llorentz

What would be a great addition would be a set default variable that would either allow for the declaration of the models to use as defaults or simply disable the openai and result in an error message showing where the undeclared model exists that's reverting to the defaults.

Some cases where I've run into it the air is fairly clear where it's arising from but that particular output from my test code in the cell it was 100% completely ambiguous there was nothing suggesting even where it arose from, as checking into the source of some of the segments where the error arose, it was still unclear. Yeah it's just a little frustrating it's kind of goofy I understand that a lot of this is being tacked on to what was originally made just for use with openai, so it is the expected evolution. Just an unexpected number of default calls.

llorentz

Thanks though I think that should fix it, it makes sense looking at it now, I'll check when I get back

LLogan M

Yea a big problem is that all this is built around openai. With recent advances in open-source LLMs though we can definitely be doing a better job with managing these defaults.

Just have to remember to always pass in the service context though and usually it's all good

llorentz

I figured that's what it was. It's just still a bit messy until it catches up with the state of multiple models, model types, locations, and online apis. Early stage and expected though

Add a reply

Find answers from the community

So digging into the docs and source am I

define prompt helper

set maximum input size

set number of output tokens

set maximum chunk overlap

define our LLM

load in HF embedding model from langchain

Load you data into 'Documents' a custom type by LlamaIndex

query with embed_model specified