Json QA

At a glance

The community member is having trouble connecting LlamaIndex to a simple JSON dataset. They have loaded the JSON data into the index, but even simple queries like "Get details of order number 12345" are returning seemingly random results. The community members suggest that a vector index may not be the best choice for this type of query, and that a keyword index or more specific preprocessing of the JSON data may be better. Another suggestion is to use an LLM to extract the order number from the query, then use that to fetch the details directly from the JSON data, rather than relying on the index. The community members also discuss the use of a SQL index, which can provide more deterministic results, but the challenge is in generating natural language responses from the SQL output.

Useful resources

TTK

I'm having some trouble trying to connect llama index to a simple json dataset. To test out, I've just taken a dump of various orders from my database into a flat json file with each order being a single flat json object (no nested elements, no arrays, etc.). I've loaded it in using
loader = JSONReader()
documents = loader.load_data(Path('./orders.json'))
index = GPTSimpleVectorIndex.from_documents(documents)

but even simple queries to this index seem to get confused like "Get details of order number 12345", it fetches seemingly random details. Am I using the wrong index type for this?

11 comments

LLogan M

Yea for hyper specific queries, I don't think a vector index would work well.

Maybe a keyword index is a better choice here? Or more specific preprocessing that parses your json into distinct documents and increasing the top_k a bit

LLogan M

For this type of query, You'd almost be better off using an LLM call to extract the order number from the text, using it to get the details from the JSON, and then constructing a list index on the fly to answer the query. But that's also a little complex lol

LLogan M

Just my initial 2 cents on the problem lol

TTK

Does the sql index work similarly? I was thinking of using a version of that to get more deterministic results

LLogan M

Definitely! The only tricky thing with the sql index is right now it just returns raw sql results, rather than creating a natural language response 🙃

LLogan M

I have a demo where I got around this by just hooking into langchain. But also would be nice if llama index had the capability itself

TTK

interesting, im tryign to work out both a read and write process for this, so potentially a different model for the translation/NL part would make sense then so it can be a consistent logic.

Seems the SQL index is kind of unique then in being a deterministic output? The mongodb loader seems more just another way to load json objects, rather than being a database that can be queried

LLogan M

Yea pretty much. Like with the sql index (or pandas index), you ask a question and it writes the sql to query the database (or pandas code lol)

You can checkout the sandbox + code I have here if you are curious

https://huggingface.co/spaces/llamaindex/llama_index_sql_sandbox

TTK

nice! Will take a look. A lot of this schema setup and indexing an probably be handled by some introspection process too i think 🤔

TTK

at some point it’ll be important to optimise that index so it can consider the database indexing rules and column/table structure in terms of using more optimized queries and joins rather than the basic ones there right now?

LLogan M

Yea the table context I give in my demo is optional. Without it, it just reads the table schema on its own

But it does write joins on its own! Seems to me the LLMs just need to get better at sql lol

Add a reply

Find answers from the community

Json QA