Since you are hacking the OpenAI class, it's trying to use openai funciton calling, but it seems like something about the response is not correct.
What did your code look like for using the actual huggingface class?
It's not a bug, this works for me and other people π€·ββοΈ
Thanks for getting back. I have also done it using no OpenAI calls at all (i.e. a file based LLM from HuggingFace, and no server calls). I'll try to include a .py file with a reproduceable case
I now have a completely non OpenAI case, referring to local HF model which also fails. It seems to want to connect to openai. I have no idea why. I'm including the source and a copy of the output.
Hmm. Maybe try explicitly passing in the service context to the two indexes + sub question engine, just to be sure
ah yea, the sub question query engine needs a service context
Otherwise it defaults to OpenAIQuestionGenerator instead of LLMQuestionGenerator
Thank you! That seems to work fine. I've been banging my head against this for 2 weeks.
Sorry about that, I think I missed the message originally
OK, I'm now on to the next stumbling block. Although it's still working with the local file-based LLM it's still failing with a local LLM (server) that uses the OpenAI API. I'm passing service context (in the Index creation and subQuestionQueryEngine definitions) , but this still results in a "Expected tool_calls in ai_message.additional_kwargs, but none found." error.
Would you have a view on what additional context or information I may need to add to get this working? I assume that given that I've defined a local embedding model and passed it with the system context, that this wouldn't by default be calling on OpenAI's embedding routines.
So now it's back to using the openai program, which it probably shouldn't be (since this is fake openai)
I would either use the OpenAILike LLM to avoid this, or manually set the question generator to use LLMQuestionGenerator
So much further ahead now. It looks like the underlying model being served is critical here (e.g.
https://docs.llamaindex.ai/en/stable/module_guides/models/llms.html). I'll need to play with these and prompts to ensure that I get back JSON, and that the answer is comprehensive enough.
Either way, thanks. Looks like I have a route forward now
I am progressing, but there is still an issue and the solution isn't obvious. I'm still following the uber/lyft example, but I have the same issue with the DeepLearning.ai exercises as well that use a QueryEngine. It seems that the questions are being generated, but there is no answer being generated. I'm not sure if this is a configuration issue.
Generated 4 sub questions.
[lyft_10k] Q: What are the customer segments that grew the fastest for Lyft
[lyft_10k] Q: What geographies grew the fastest for Lyft
[uber_10k] Q: What are the customer segments that grew the fastest for Uber
[uber_10k] Q: What geographies grew the fastest for Uber
print(response)
Empty Response
I get the same result whether I use an OpenAILike model (Zephyr Alpha), or a local file-based HuggingFace Zephyr model.
empty response is a classic for either when the input is too large
What does your LLM and service context setup look like at the moment?
llm = OpenAILike(
context_window=8192,
is_function_calling = True,
api_base='http://localhost:1234/v1',
api_key='...'
)
llm.max_tokens=8192
llm.is_chat_model=False
I'll drop them right down
ohhh you can't set max_tokens the same as context_window, that will mess with quite a few settings I think π
context window is the maximum input size
max_tokens is how many tokens are possible to generate
LLMs generate tokens one at at time, add to the input, and generate the next token. So llama-index will try to leave room for max_tokens
to be predicted
set it to something like 512
I've done just that:
llm = OpenAILike(
context_window=512,
is_function_calling = True,
api_base='http://localhost:1234/v1',
api_key='...'
)
llm.max_tokens=512
llm.is_chat_model=False
Generated 4 sub questions.
[lyft_10k] Q: What are the top three customer segments that grew the fastest for Lyft in year 2021
[lyft_10k] Q: What are the geographies that grew the fastest for Lyft in year 2021
[uber_10k] Q: What are the top three customer segments that grew the fastest for Uber in year 2021
[uber_10k] Q: What are the geographies that grew the fastest for Uber in year 2021
llm = OpenAILike(
context_window=4096,
is_function_calling = True,
api_base='http://localhost:1234/v1',
api_key='...'
)
llm.max_tokens=512
llm.is_chat_model=False
Same issue. I played with a few combinations of max tokens because it was clear that total tokens were > 512.
I went with this in the end:
llm = OpenAILike(
context_window=4096,
is_function_calling = True,
api_base='http://localhost:1234/v1',
api_key='...'
)
llm.max_tokens=1024
llm.is_chat_model=False
But got the same result:
Generated 4 sub questions.
[lyft_10k] Q: What are the top three customer segments that grew the fastest for Lyft in year 2021
[lyft_10k] Q: What are the geographies that grew the fastest for Lyft in year 2021
[uber_10k] Q: What are the top three customer segments that grew the fastest for Uber in year 2021
[uber_10k] Q: What are the geographies that grew the fastest for Uber in year 2021
[2023-12-14 22:51:57.368] [INFO] Generated prediction: {
"id": "cmpl-6a69fk3r9t8gjpxvr4vlfs",
"object": "text_completion",
"created": 1702594301,
"model": "/Users/jon/.cache/lm-studio/models/TheBloke/zephyr-7B-alpha-GGUF/zephyr-7b-alpha.Q8_0.gguf",
"choices": [
{
"index": 0,
"text": "
json\n[\n {\n \"sub_question\": \"What are the top three customer segments that grew the fastest for Lyft in year 2021\",\n \"tool_name\": \"lyft_10k\"\n },\n {\n \"sub_question\": \"What are the geographies that grew the fastest for Lyft in year 2021\",\n \"tool_name\": \"lyft_10k\"\n },\n {\n \"sub_question\": \"What are the top three customer segments that grew the fastest for Uber in year 2021\",\n \"tool_name\": \"uber_10k\"\n },\n {\n \"sub_question\": \"What are the geographies that grew the fastest for Uber in year 2021\",\n \"tool_name\": \"uber_10k\"\n }\n]\n
",
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 410,
"completion_tokens": 210,
"total_tokens": 620
}
}
So it's clearly working I guess, since it's generating questions.
What happens if you manually query a sub-index using one of those questions?
Are the sub-indexes using the same service context?
I don't know how to manually query a sub-index. I tried to add the system-content to the sub-index definition, but it complained that it was an invalid parameter.
service-context, not system-context
lyft_engine.query("My query")
Yea just VectorStoreIndex.from_documents(..., service_context=service_context)
e.g. lyft_index = VectorStoreIndex.from_documents(lyft_docs, service_context=service_context)
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)
And s_engine = SubQuestionQueryEngine.from_defaults(
service_context = service_context,
query_engine_tools=query_engine_tools
)
So does using that query engine work?
i.e.
response = lyft_engine.query("What are the geographies that grew the fastest for Lyft in year 2021?")
print(response)
Getting an error with that. I'll reset the notebook and try it separately with no baggage lying around
It works:
Lyft's revenue growth was driven by strong demand across all of our markets in 2021, with the largest increases coming from California and Texas.
So... is the subquery engine not defined properly?
Good result for uber question:
OK @Logan M - something odd here. It now seems to work. However, I added service-context to the individual query_engine, by mistake. I don't think this belongs there. It didn't complain, and I'm now getting answers.
e.g.:
uber_engine = uber_index.as_query_engine(similarity_top_k=3, use_async=True, service_context=service_context)
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3, use_async=True, service_context=service_context)
[lyft_10k] Q: What are the customer segments that grew the fastest for Lyft
[lyft_10k] Q: What geographies grew the fastest for Lyft
[uber_10k] Q: What are the customer segments that grew the fastest for Uber
[uber_10k] Q: What geographies grew the fastest for Uber
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using
tokenizers
before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[uber_10k] A:
In terms of geographic growth, our Mobility Gross Bookings in Latin America grew by 63% year-over-year in 2021, which was the highest growth rate among all regions.
[uber_10k] A:
The given material does not provide any information about the customer segments that grew the fastest for Uber. The provided financial statements only show the revenue, expenses, and net income of the company for the year ended December 31, 2020, and 2021.
[lyft_10k] A: 18-24 year olds and 35-49 year olds were the two largest customer segments that grew the fastest for Lyft in 2017.
[lyft_10k] A: 1) Phoenix, AZ
2) Sacramento, CA
3) Las Vegas, NV
4) San Antonio, TX
5) Orlando, FL
6) Tampa, FL
.
.
etc... followed by
print(response)
According to the given material, Lyft's two largest customer segments that grew the fastest in 2017 were 18-24 year olds and 35-49 year olds. On the other hand, Uber's highest growth rate among all regions was in Latin America for its Mobility Gross Bookings in 2021. Therefore, there is no direct comparison between the customer segments that grew the fastest for Lyft and Uber as they are not the same metrics being compared. However, it can be concluded that while Lyft's growth was focused on specific age groups, Uber's growth was more regionally-focused in Latin America.
Cool! So it seems to be working!