Flan t5

At a glance

My output is INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 46 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 18 tokens

24 comments

LLogan M

Flan is a little tricky to use.

Can I see how you setup the index? Did you use a prompt helper?

CChris

class CustomLLM(LLM):
model_name = "google/flan-t5-large"
pipeline = pipeline("text-generation", model=model_name, device="cuda:0", model_kwargs={"torch_dtype":torch.bfloat16})

def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
prompt_length = len(prompt)
print(prompt)
response = self.pipeline(prompt, max_new_tokens=num_output)[0]["generated_text"]

# only return newly generated tokens
return response[prompt_length:]

@property
def _identifying_params(self) -> Mapping[str, Any]:
return {"name_of_model": self.model_name}

@property
def _llm_type(self) -> str:
return "custom"

llm_predictor = LLMPredictor(llm=CustomLLM())
embed_model = LangchainEmbedding(HuggingFaceEmbeddings())

service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper,embed_model=embed_model)

Load the your data

documents = SimpleDirectoryReader('data').load_data()
index = GPTListIndex.from_documents(documents, service_context=service_context)

index.save_to_disk('index.json')

new_index = GPTListIndex.load_from_disk('index.json', service_context=service_context)

Query and print response

query with embed_model specified

response = new_index.query(
"how much was spent at COMCAST?",
mode="embedding",
verbose=True,
service_context=service_context
)

print(response)

LLogan M

I see you passing in the prompt helper, what are the settings for that? I think that will be the main thing to tweak

CChris

max_input_size = 2048

set number of output tokens

num_output = 256

set maximum chunk overlap

max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

LLogan M

Aha!

CChris

Really appreciate the help!

LLogan M

Flans max input size is very small (512)

Try with something like these settings

Plain Text

max_input_size=512
num_output=256 
max_chunk_overlap=20

However, by default flan outputs 512 tokens, you we need to find the setting to change that to 256 in the pipeline

UPDATE: ah I see you are setting this

CChris

Running now looks better already!

CChris

snap so it looked promising was i started seeing :

CChris

We have provided an existing answer:
We have the opportunity to refine the existing answer (only if needed) with some more context below.
------------
FOOD MARKET INC,Groceries,Sale,-63.75,
01/01/2022,01/02/2022,LYFT *2 RIDES 12-30,Travel,Sale,-54.23,
12/30/2021,01/02/2022,TAQUERIA DOWNTOWN CATE,Food & Drink,Sale,-23.35,
01/01/2022,01/02/2022,DD DOORDASH CHIPOTLE,Food & Drink,Sale,-26.60,

CChris

but in the end Empty Response

LLogan M

Yeaaa sounds familiar

I see you are actually setting the num output in the pipeline. You might need to reduce it further (maybe 150?)

The way flan works is a little different than GPT models (encoder/decoder model vs. decoder models), it makes it a little tricky 🤔

CChris

is there a better model you'd recommend

CChris

goal was to avoid the GPT costs while doing dev : (

CChris

and then upgrade once i have a built solution

LLogan M

How much VRAM do you have access to?

CChris

32 , but i can spin a VM if needed

LLogan M

Oh cool!

This model might be interesting to try
https://huggingface.co/facebook/opt-iml-max-1.3b

And of course I'm sure you've seen all the github repos with things like alpaca, llama, gpt4all. All those would be good options too, but just a bit more setup since they aren't in huggingface

LLogan M

Just be aware that depending on the model you are using, you need to adjust the prompt helper to the input size of that model

LLogan M

Also note #2, depending on. Which LLM you use, you might get pretty varied performance/quality of answers 😅 but hopefully they are mostly the same

CChris

haha yeah, hard to bet chatgpt : (

CChris

but appreciate the help! one question when using huggingface

CChris

is the data being passed outside of my local machine

LLogan M

Nope! All local