LlamaIndex

Log inLog into community

Find answers from the community

Updated 6 months ago

Hi - I'm trying to get some query_engine

Hi - I'm trying to get some query_engine

At a glance

The community member is trying to get a query_engine sample code to work from the llama-index documentation, but is facing issues when using a local LLM instead of OpenAI. The community member has tried various approaches, including using a local HuggingFace model and a local LLM server, but is encountering errors such as "ValueError: Expected tool_calls in ai_message.additional_kwargs, but none found."

The community members discuss potential solutions, such as explicitly passing the service context to the indexes and sub-question query engine, and adjusting the LLM settings like context_window and max_tokens. Eventually, the community members are able to get the code working, but there are still some issues with the sub-question query engine and the answers being generated.

The community members continue to troubleshoot and experiment with different configurations, including manually querying the sub-indexes and adjusting the service context settings. Eventually, the code seems to be working, and the community members are able to get answers to the questions about Lyft and Uber's customer segments and geographies that grew the fastest.

Useful resources

·

Hi - I'm trying to get some query_engine sample code to work from this page: https://docs.llamaindex.ai/en/stable/examples/usecases/10k_sub_question.html

llama-index is version 0.9.13

One key difference that I have with the sample and my code is that I am using a local LLM, and not OpenAI. However, I am setting a local dummy key and an alternative base_url, which points to my local server. My embedding model is also a local HF one. It seems to build the indexes fine, and make a call to the local LLM, but the query engine seems to fail.

The error is "ValueError: Expected tool_calls in ai_message.additional_kwargs, but none found.".

Source and error trace in the included error.txt file

L

j

49 comments

Since you are hacking the OpenAI class, it's trying to use openai funciton calling, but it seems like something about the response is not correct.

What did your code look like for using the actual huggingface class?

It's not a bug, this works for me and other people 🤷‍♂️

Thanks for getting back. I have also done it using no OpenAI calls at all (i.e. a file based LLM from HuggingFace, and no server calls). I'll try to include a .py file with a reproduceable case

I now have a completely non OpenAI case, referring to local HF model which also fails. It seems to want to connect to openai. I have no idea why. I'm including the source and a copy of the output.

Hmm. Maybe try explicitly passing in the service context to the two indexes + sub question engine, just to be sure

ah yea, the sub question query engine needs a service context

Attachment

Otherwise it defaults to OpenAIQuestionGenerator instead of LLMQuestionGenerator

Thank you! That seems to work fine. I've been banging my head against this for 2 weeks.

Sorry about that, I think I missed the message originally

😅

OK, I'm now on to the next stumbling block. Although it's still working with the local file-based LLM it's still failing with a local LLM (server) that uses the OpenAI API. I'm passing service context (in the Index creation and subQuestionQueryEngine definitions) , but this still results in a "Expected tool_calls in ai_message.additional_kwargs, but none found." error.

Would you have a view on what additional context or information I may need to add to get this working? I assume that given that I've defined a local embedding model and passed it with the system context, that this wouldn't by default be calling on OpenAI's embedding routines.

So now it's back to using the openai program, which it probably shouldn't be (since this is fake openai)

I would either use the OpenAILike LLM to avoid this, or manually set the question generator to use LLMQuestionGenerator

Thank's I will try that

So much further ahead now. It looks like the underlying model being served is critical here (e.g. https://docs.llamaindex.ai/en/stable/module_guides/models/llms.html). I'll need to play with these and prompts to ensure that I get back JSON, and that the answer is comprehensive enough.

Either way, thanks. Looks like I have a route forward now

I am progressing, but there is still an issue and the solution isn't obvious. I'm still following the uber/lyft example, but I have the same issue with the DeepLearning.ai exercises as well that use a QueryEngine. It seems that the questions are being generated, but there is no answer being generated. I'm not sure if this is a configuration issue.

Generated 4 sub questions.
[lyft_10k] Q: What are the customer segments that grew the fastest for Lyft
[lyft_10k] Q: What geographies grew the fastest for Lyft
[uber_10k] Q: What are the customer segments that grew the fastest for Uber
[uber_10k] Q: What geographies grew the fastest for Uber

print(response)
Empty Response

I get the same result whether I use an OpenAILike model (Zephyr Alpha), or a local file-based HuggingFace Zephyr model.

empty response is a classic for either when the input is too large

What does your LLM and service context setup look like at the moment?

llm = OpenAILike(
context_window=8192,
is_function_calling = True,
api_base='http://localhost:1234/v1',
api_key='...'
)
llm.max_tokens=8192
llm.is_chat_model=False

I'll drop them right down

ohhh you can't set max_tokens the same as context_window, that will mess with quite a few settings I think 😅

context window is the maximum input size

max_tokens is how many tokens are possible to generate

LLMs generate tokens one at at time, add to the input, and generate the next token. So llama-index will try to leave room for max_tokens to be predicted

set it to something like 512

I've done just that:
llm = OpenAILike(
context_window=512,
is_function_calling = True,
api_base='http://localhost:1234/v1',
api_key='...'
)
llm.max_tokens=512
llm.is_chat_model=False

Generated 4 sub questions.
[lyft_10k] Q: What are the top three customer segments that grew the fastest for Lyft in year 2021
[lyft_10k] Q: What are the geographies that grew the fastest for Lyft in year 2021
[uber_10k] Q: What are the top three customer segments that grew the fastest for Uber in year 2021
[uber_10k] Q: What are the geographies that grew the fastest for Uber in year 2021

Plain Text

llm = OpenAILike(
    context_window=4096,
    is_function_calling = True,
    api_base='http://localhost:1234/v1',
    api_key='...'
    )
llm.max_tokens=512
llm.is_chat_model=False

try that

Same issue. I played with a few combinations of max tokens because it was clear that total tokens were > 512.

I went with this in the end:

llm = OpenAILike(
context_window=4096,
is_function_calling = True,
api_base='http://localhost:1234/v1',
api_key='...'
)
llm.max_tokens=1024
llm.is_chat_model=False

But got the same result:

Generated 4 sub questions.
[lyft_10k] Q: What are the top three customer segments that grew the fastest for Lyft in year 2021
[lyft_10k] Q: What are the geographies that grew the fastest for Lyft in year 2021
[uber_10k] Q: What are the top three customer segments that grew the fastest for Uber in year 2021
[uber_10k] Q: What are the geographies that grew the fastest for Uber in year 2021

[2023-12-14 22:51:57.368] [INFO] Generated prediction: {
"id": "cmpl-6a69fk3r9t8gjpxvr4vlfs",
"object": "text_completion",
"created": 1702594301,
"model": "/Users/jon/.cache/lm-studio/models/TheBloke/zephyr-7B-alpha-GGUF/zephyr-7b-alpha.Q8_0.gguf",
"choices": [
{
"index": 0,
"text": "

Plain Text

json\n[\n    {\n        \"sub_question\": \"What are the top three customer segments that grew the fastest for Lyft in year 2021\",\n        \"tool_name\": \"lyft_10k\"\n    },\n    {\n        \"sub_question\": \"What are the geographies that grew the fastest for Lyft in year 2021\",\n        \"tool_name\": \"lyft_10k\"\n    },\n    {\n        \"sub_question\": \"What are the top three customer segments that grew the fastest for Uber in year 2021\",\n        \"tool_name\": \"uber_10k\"\n    },\n    {\n        \"sub_question\": \"What are the geographies that grew the fastest for Uber in year 2021\",\n        \"tool_name\": \"uber_10k\"\n    }\n]\n

",
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 410,
"completion_tokens": 210,
"total_tokens": 620
}
}

So it's clearly working I guess, since it's generating questions.

What happens if you manually query a sub-index using one of those questions?

Are the sub-indexes using the same service context?

I don't know how to manually query a sub-index. I tried to add the system-content to the sub-index definition, but it complained that it was an invalid parameter.

service-context, not system-context

lyft_engine.query("My query")

Yea just VectorStoreIndex.from_documents(..., service_context=service_context)

I have done that

e.g. lyft_index = VectorStoreIndex.from_documents(lyft_docs, service_context=service_context)
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)

ok cool

And s_engine = SubQuestionQueryEngine.from_defaults(
service_context = service_context,
query_engine_tools=query_engine_tools
)

So does using that query engine work?

i.e.
response = lyft_engine.query("What are the geographies that grew the fastest for Lyft in year 2021?")
print(response)

Getting an error with that. I'll reset the notebook and try it separately with no baggage lying around

It works:

Lyft's revenue growth was driven by strong demand across all of our markets in 2021, with the largest increases coming from California and Texas.

So... is the subquery engine not defined properly?

Good result for uber question:

Attachment

OK @Logan M - something odd here. It now seems to work. However, I added service-context to the individual query_engine, by mistake. I don't think this belongs there. It didn't complain, and I'm now getting answers.

e.g.:
uber_engine = uber_index.as_query_engine(similarity_top_k=3, use_async=True, service_context=service_context)
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3, use_async=True, service_context=service_context)

[lyft_10k] Q: What are the customer segments that grew the fastest for Lyft
[lyft_10k] Q: What geographies grew the fastest for Lyft
[uber_10k] Q: What are the customer segments that grew the fastest for Uber
[uber_10k] Q: What geographies grew the fastest for Uber
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:

Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

[uber_10k] A:

In terms of geographic growth, our Mobility Gross Bookings in Latin America grew by 63% year-over-year in 2021, which was the highest growth rate among all regions.
[uber_10k] A:

The given material does not provide any information about the customer segments that grew the fastest for Uber. The provided financial statements only show the revenue, expenses, and net income of the company for the year ended December 31, 2020, and 2021.

[lyft_10k] A: 18-24 year olds and 35-49 year olds were the two largest customer segments that grew the fastest for Lyft in 2017.
[lyft_10k] A: 1) Phoenix, AZ
2) Sacramento, CA
3) Las Vegas, NV
4) San Antonio, TX
5) Orlando, FL
6) Tampa, FL
.
.
etc... followed by

print(response)
According to the given material, Lyft's two largest customer segments that grew the fastest in 2017 were 18-24 year olds and 35-49 year olds. On the other hand, Uber's highest growth rate among all regions was in Latin America for its Mobility Gross Bookings in 2021. Therefore, there is no direct comparison between the customer segments that grew the fastest for Lyft and Uber as they are not the same metrics being compared. However, it can be concluded that while Lyft's growth was focused on specific age groups, Uber's growth was more regionally-focused in Latin America.

Cool! So it seems to be working!

Add a reply

Sign up and join the conversation on Discord