Find answers from the community

Updated 3 months ago

Hello, I can’t seem to find a way to let

Hello, I can’t seem to find a way to let the response of a query_engine.query() to be JSON only. Is there a option for this?
W
T
13 comments
You want the entire Response object as JSON?
Yes, I looked in to pydantic program. But then using a output_cls it errors on JSON encode. With errors like “extra data” or “expected semi column”.
I’m not using OpenAI. Just a local hosted Llama chat 13b. Maybe the issue is that the input is also not json, but I thought it’s easier for the llm to process. I can try this as well.
LLM like llama are not good at following pydantic output feature.
Which one would you recommend?
Or can I use another module that refines the output?
https://docs.llamaindex.ai/en/stable/module_guides/models/llms.html#open-source-llms

This is the compatibility report for few open source LLMs that LlamaIndex has tested for different factors. For pydantic only zephyr and Starling are showing good result
I think there is guardrail for there in LlamaIndex, let me share the link for the same.
I tried that one as well. But there was no difference in the output. I will share my code in about 1 hour
Yes looking in to the docs, I’m not sure if “output_parser” is possible on LocalTensorRTLLM

attach to an llm object

llm = OpenAI(output_parser=output_parser)
My LLM object:
llm = LocalTensorRTLLM(
model_path="./model",
engine_name="llama_float16_tp1_rank0.engine",
tokenizer_dir="meta-llama/Llama-2-13b-chat",
completion_to_prompt=completion_to_prompt,
)
It also gives a ⚠️ at mistral 7b, this is a model I could use with TensorRT. Do you think it will be worth it to try out?
Yeah as the feedback says this: Mistral seems slightly more reliable for structured outputs compared to Llama2. Likely with some prompt engineering, it may do better.
Add a reply
Sign up and join the conversation on Discord