Find answers from the community

Updated 2 months ago

Flan t5

My output is INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 46 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 18 tokens
L
C
24 comments
Flan is a little tricky to use.

Can I see how you setup the index? Did you use a prompt helper?
class CustomLLM(LLM):
model_name = "google/flan-t5-large"
pipeline = pipeline("text-generation", model=model_name, device="cuda:0", model_kwargs={"torch_dtype":torch.bfloat16})

def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
prompt_length = len(prompt)
print(prompt)
response = self.pipeline(prompt, max_new_tokens=num_output)[0]["generated_text"]

# only return newly generated tokens
return response[prompt_length:]

@property
def _identifying_params(self) -> Mapping[str, Any]:
return {"name_of_model": self.model_name}

@property
def _llm_type(self) -> str:
return "custom"


llm_predictor = LLMPredictor(llm=CustomLLM())
embed_model = LangchainEmbedding(HuggingFaceEmbeddings())

service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper,embed_model=embed_model)

Load the your data

documents = SimpleDirectoryReader('data').load_data()
index = GPTListIndex.from_documents(documents, service_context=service_context)

index.save_to_disk('index.json')

new_index = GPTListIndex.load_from_disk('index.json', service_context=service_context)

Query and print response

query with embed_model specified

response = new_index.query(
"how much was spent at COMCAST?",
mode="embedding",
verbose=True,
service_context=service_context
)

print(response)
I see you passing in the prompt helper, what are the settings for that? I think that will be the main thing to tweak
max_input_size = 2048

set number of output tokens

num_output = 256

set maximum chunk overlap

max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)
Really appreciate the help!
Flans max input size is very small (512)

Try with something like these settings

Plain Text
max_input_size=512
num_output=256 
max_chunk_overlap=20


However, by default flan outputs 512 tokens, you we need to find the setting to change that to 256 in the pipeline

UPDATE: ah I see you are setting this
Running now looks better already!
snap so it looked promising was i started seeing :
We have provided an existing answer:
We have the opportunity to refine the existing answer (only if needed) with some more context below.
------------
FOOD MARKET INC,Groceries,Sale,-63.75,
01/01/2022,01/02/2022,LYFT *2 RIDES 12-30,Travel,Sale,-54.23,
12/30/2021,01/02/2022,TAQUERIA DOWNTOWN CATE,Food & Drink,Sale,-23.35,
01/01/2022,01/02/2022,DD DOORDASH CHIPOTLE,Food & Drink,Sale,-26.60,
but in the end Empty Response
Yeaaa sounds familiar

I see you are actually setting the num output in the pipeline. You might need to reduce it further (maybe 150?)

The way flan works is a little different than GPT models (encoder/decoder model vs. decoder models), it makes it a little tricky πŸ€”
is there a better model you'd recommend
goal was to avoid the GPT costs while doing dev : (
and then upgrade once i have a built solution
How much VRAM do you have access to?
32 , but i can spin a VM if needed
Oh cool!

This model might be interesting to try
https://huggingface.co/facebook/opt-iml-max-1.3b

And of course I'm sure you've seen all the github repos with things like alpaca, llama, gpt4all. All those would be good options too, but just a bit more setup since they aren't in huggingface
Just be aware that depending on the model you are using, you need to adjust the prompt helper to the input size of that model
Also note #2, depending on. Which LLM you use, you might get pretty varied performance/quality of answers πŸ˜… but hopefully they are mostly the same
haha yeah, hard to bet chatgpt : (
but appreciate the help! one question when using huggingface
is the data being passed outside of my local machine
Nope! All local
Add a reply
Sign up and join the conversation on Discord