Find answers from the community

Updated 3 months ago

Hello I have a question I m using llama

Hello! I have a question -- I'm using llama 2 and I have a big json blob (50mb of text) (it is quite nested and contains a lot of 1-page documents, and i was wondering what the best way to index that is? Would it still be using the JSON index (it's gona be tedious to come up with a full schema) or is there a way for me to turn it into a document somehow?
L
s
9 comments
the default json loader should do ok-ish at it

Or you can iterate over the json and parse out each document manually into Document objects
gotcha, okay yeah that's what i was thinking too, do you happen to know how 'smart' the json loader is compared to the document parser? id love to preserve the structure since its important, but also wondering if the json loader has capabilities beyond basic lookup/analytics on json data? e.g. summarization
*how smart an LLM using json loader vs document loader can be
I think the base json loader does a sort of flattening operation

Like if you have something like

{key1: {key2: val}}

It might parse string like key1/key2: val

I haven't checked in a hot minute, just going off memory lol
gotcha, ill play around with it, thanks for the help!
hey @Logan M sorry to bother again - im wondering if ya'll ever got JSONLoader working with the Llama models?

Reason Im asking is because I'm getting an error trying to use Llama 2 + json loader (running the example here with pretty much everything the same except service_context where I use my Llama 2 pipeline https://gpt-index.readthedocs.io/en/latest/examples/query_engine/json_query_engine.html) ,

Looking at the source code, i think it's because Llama 2 doesnt have function calling and this line uses the LLM to create json_path_response_str (which should be a jsonpath query). But wanted to double check, in case im wrong and theres another error. https://github.com/jerryjliu/llama_index/blob/644c034a249fa359181f8ebe988b8c2b93401814/llama_index/indices/struct_store/json_query.py#L105
okay, confirming that this was the issue ^ I was able to force llama 2 to output just the jsonpath and return the correct answer. I wonder if a thin regex wrapper around json_path_response_str could be helpful in the future for other users using LLM models without function calling (let me know if you like the idea and i can make a feature request)
oh, you mean the JSON query engine, not the json loader πŸ˜…

Likely a thin wrapper/parser to parse the json path out of the response would be helpful. Llama2 likes to be verbose haha
oh yes sorry my bad! yup thats what i was thinking
Add a reply
Sign up and join the conversation on Discord