Hello I have a question I m using llama

At a glance

Hello! I have a question -- I'm using llama 2 and I have a big json blob (50mb of text) (it is quite nested and contains a lot of 1-page documents, and i was wondering what the best way to index that is? Would it still be using the JSON index (it's gona be tedious to come up with a full schema) or is there a way for me to turn it into a document somehow?

9 comments

LLogan M

the default json loader should do ok-ish at it

Or you can iterate over the json and parse out each document manually into Document objects

ssassyjim

gotcha, okay yeah that's what i was thinking too, do you happen to know how 'smart' the json loader is compared to the document parser? id love to preserve the structure since its important, but also wondering if the json loader has capabilities beyond basic lookup/analytics on json data? e.g. summarization

ssassyjim

*how smart an LLM using json loader vs document loader can be

LLogan M

I think the base json loader does a sort of flattening operation

Like if you have something like

{key1: {key2: val}}

It might parse string like key1/key2: val

I haven't checked in a hot minute, just going off memory lol

ssassyjim

gotcha, ill play around with it, thanks for the help!

ssassyjim

hey @Logan M sorry to bother again - im wondering if ya'll ever got JSONLoader working with the Llama models?

Reason Im asking is because I'm getting an error trying to use Llama 2 + json loader (running the example here with pretty much everything the same except service_context where I use my Llama 2 pipeline https://gpt-index.readthedocs.io/en/latest/examples/query_engine/json_query_engine.html) ,

Looking at the source code, i think it's because Llama 2 doesnt have function calling and this line uses the LLM to create json_path_response_str (which should be a jsonpath query). But wanted to double check, in case im wrong and theres another error. https://github.com/jerryjliu/llama_index/blob/644c034a249fa359181f8ebe988b8c2b93401814/llama_index/indices/struct_store/json_query.py#L105

ssassyjim

okay, confirming that this was the issue ^ I was able to force llama 2 to output just the jsonpath and return the correct answer. I wonder if a thin regex wrapper around json_path_response_str could be helpful in the future for other users using LLM models without function calling (let me know if you like the idea and i can make a feature request)

LLogan M

oh, you mean the JSON query engine, not the json loader 😅

Likely a thin wrapper/parser to parse the json path out of the response would be helpful. Llama2 likes to be verbose haha

ssassyjim

oh yes sorry my bad! yup thats what i was thinking

Add a reply

Find answers from the community

Hello I have a question I m using llama