I never looked in embeddings docs because I did not consider them as such. and everyone on so many forums / discussions just pointed back to readthedocs (invariably 404) related to custom models, perhaps the docs changed and segmented, which is why I was confused by the discussions, which should now link to embeddings? I think they were just old posts etc. Since it is rapidly changing documentation with updates.
Thanks, let me look into this.
one other question. What if I have a locally stored model, is there a way to just use that instead of grabbing from huggingface?
Not an easy way... You'd have to extend the base emebddings class from llama index π€
I don't think it would be too hard to extend it as such, really just add a local-dir flag opt and grab from there instead of downloading if present, unless it is doing somethin goofy with it, hacky but would work for the interim. I'll look at the src when I get to needing it .
So in a similar issue here.
import torch
import tensorflow
from langchain.llms.base import LLM
from llama_index import SimpleDirectoryReader,GPTKeywordTableIndex, PromptHelper, GPTVectorStoreIndex, LangchainEmbedding, ServiceContext, GPTListIndex, SimpleDirectoryReader, LLMPredictor, ServiceContext
from transformers import pipeline
from typing import Optional, List, Mapping, Any
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
import pdb
define prompt helper
set maximum input size
max_input_size = 2048
set number of output tokens
num_output = 256
set maximum chunk overlap
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)
class CustomLLM(LLM):
# model_name="gozfarb/mosaicml_mpt-7b-storywriter-apache"
model_name="google/flan-t5-base"
pipeline = pipeline("text2text-generation", model=model_name, device="cuda:0", trust_remote_code=True)
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
prompt_length = len(prompt)
response = self.pipeline(prompt, max_new_tokens=num_output)[0]["generated_text"]
# only return newly generated tokens
return response[prompt_length:]
@property
def _identifying_params(self) -> Mapping[str, Any]:
return {"name_of_model": self.model_name}
@property
def _llm_type(self) -> str:
return "custom"
define our LLM
llm_predictor = LLMPredictor(llm=CustomLLM())
load in HF embedding model from langchain
embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embed_model)
Load you data into 'Documents' a custom type by LlamaIndex
documents = SimpleDirectoryReader('./data').load_data()
new_index = GPTListIndex.from_documents(documents)
query with embed_model specified
query_engine = new_index.as_query_engine(retriever_mode="embedding",
verbose=True,
service_context=service_context
)
This works just fine, no openai call, or complaint for lack of KEY
response = query_engine.query("Why did Jupiter want to flood the Earth?")
print(response)
This however, still calls openAI and I cannot figure out the reason for this, the error is not very clear, no indication I can see of why the default is still being fallen back on
same issue as before, but, alas, different.
Also, ignore the horrorcode, it's mishmash of multiple cells from a notebook, but it works, until the query
new_index = GPTListIndex.from_documents(documents, service_context=service_context)
<- maybe also add the service context here? Little frustrating with these defaults I know lol
What would be a great addition would be a set default variable that would either allow for the declaration of the models to use as defaults or simply disable the openai and result in an error message showing where the undeclared model exists that's reverting to the defaults.
Some cases where I've run into it the air is fairly clear where it's arising from but that particular output from my test code in the cell it was 100% completely ambiguous there was nothing suggesting even where it arose from, as checking into the source of some of the segments where the error arose, it was still unclear. Yeah it's just a little frustrating it's kind of goofy I understand that a lot of this is being tacked on to what was originally made just for use with openai, so it is the expected evolution. Just an unexpected number of default calls.
Thanks though I think that should fix it, it makes sense looking at it now, I'll check when I get back
Yea a big problem is that all this is built around openai. With recent advances in open-source LLMs though we can definitely be doing a better job with managing these defaults.
Just have to remember to always pass in the service context though and usually it's all good
I figured that's what it was. It's just still a bit messy until it catches up with the state of multiple models, model types, locations, and online apis. Early stage and expected though