Find answers from the community

Updated 3 months ago

Hello, I can’t seem to find a way to let

Hello, I can’t seem to find a way to let the response of a query_engine.query() to be JSON only. Is there a option for this?

13 comments

WWhiteFang_Jr

You want the entire Response object as JSON?

TTeis

Yes, I looked in to pydantic program. But then using a output_cls it errors on JSON encode. With errors like “extra data” or “expected semi column”.
I’m not using OpenAI. Just a local hosted Llama chat 13b. Maybe the issue is that the input is also not json, but I thought it’s easier for the llm to process. I can try this as well.

WWhiteFang_Jr

LLM like llama are not good at following pydantic output feature.

TTeis

Which one would you recommend?

TTeis

Or can I use another module that refines the output?

WWhiteFang_Jr

https://docs.llamaindex.ai/en/stable/module_guides/models/llms.html#open-source-llms

This is the compatibility report for few open source LLMs that LlamaIndex has tested for different factors. For pydantic only zephyr and Starling are showing good result

WWhiteFang_Jr

I think there is guardrail for there in LlamaIndex, let me share the link for the same.

WWhiteFang_Jr

https://docs.llamaindex.ai/en/stable/module_guides/querying/structured_outputs/structured_outputs.html
You'll find more info in this link

TTeis

I tried that one as well. But there was no difference in the output. I will share my code in about 1 hour

TTeis

Yes looking in to the docs, I’m not sure if “output_parser” is possible on LocalTensorRTLLM

attach to an llm object

llm = OpenAI(output_parser=output_parser)

TTeis

My LLM object:
llm = LocalTensorRTLLM(
model_path="./model",
engine_name="llama_float16_tp1_rank0.engine",
tokenizer_dir="meta-llama/Llama-2-13b-chat",
completion_to_prompt=completion_to_prompt,
)

TTeis

It also gives a ⚠️ at mistral 7b, this is a model I could use with TensorRT. Do you think it will be worth it to try out?

WWhiteFang_Jr

Yeah as the feedback says this: Mistral seems slightly more reliable for structured outputs compared to Llama2. Likely with some prompt engineering, it may do better.

Add a reply