Query

At a glance

Can I ask LlamaIndex for just its answer—without the prompt, context, Answer: prefix, etc?

22 comments

I'm not sure what you mean. Do you just want to ask an LLM something?

response = llm.complete("what is the meaning of life?")

Something like that?

When I do this:

Plain Text

response = query_engine.query(task)
print(response)

I see all the retrieved nodes, my entire prompt, an "Answer:" heading, and THEN the answer.

However, I just want that last part (the LLM's answer) without having to do a clunky text split that'll break once I change LLMs / prompt templates.

What LLM/LLM class are you using?

Seems like it's returning the full input for some reason

It's a 7Bx2 MoE based on Mistral/Mixtral.

and I'm using VllmServer.

Is there something like a Boolean return_full_text parameter like TextGenerationPipeline has?

Hmmm not that I know of (but I am not a vllm expert either lol)

Vllm really should be handling that

So if I stop using vLLM, this problem wouldn't exist?

When I use vLLM outside of LlamaIndex, this isn't an issue.

Idk 🤷‍♂️ I don't use vllm lol. Maybe it's an arg somewhere.

Feel free to check out the source code and cross reference the usage there with vllm docs
https://github.com/run-llama/llama_index/blob/f916839e81ff8bd3006fe3bf4df3f59ba7f37da3/llama-index-integrations/llms/llama-index-llms-vllm/llama_index/llms/vllm/base.py#L294

Imo for vllm, it's probably easier to launch it with the openai api mode, and just use the OpenAPILike LLM class

That link is for the Vllm and VllmServer classes. The latter is talking to my OpenAI-compatible vLLM service running on my machine.

What exactly is the OpenAPILike LLM class? When I search for it in your repo, I get zero results.

Attachment