Find answers from the community

Updated 3 months ago

Json QA

I'm having some trouble trying to connect llama index to a simple json dataset. To test out, I've just taken a dump of various orders from my database into a flat json file with each order being a single flat json object (no nested elements, no arrays, etc.). I've loaded it in using
loader = JSONReader()
documents = loader.load_data(Path('./orders.json'))
index = GPTSimpleVectorIndex.from_documents(documents)

but even simple queries to this index seem to get confused like "Get details of order number 12345", it fetches seemingly random details. Am I using the wrong index type for this?
L
T
11 comments
Yea for hyper specific queries, I don't think a vector index would work well.

Maybe a keyword index is a better choice here? Or more specific preprocessing that parses your json into distinct documents and increasing the top_k a bit
For this type of query, You'd almost be better off using an LLM call to extract the order number from the text, using it to get the details from the JSON, and then constructing a list index on the fly to answer the query. But that's also a little complex lol
Just my initial 2 cents on the problem lol
Does the sql index work similarly? I was thinking of using a version of that to get more deterministic results
Definitely! The only tricky thing with the sql index is right now it just returns raw sql results, rather than creating a natural language response πŸ™ƒ
I have a demo where I got around this by just hooking into langchain. But also would be nice if llama index had the capability itself
interesting, im tryign to work out both a read and write process for this, so potentially a different model for the translation/NL part would make sense then so it can be a consistent logic.

Seems the SQL index is kind of unique then in being a deterministic output? The mongodb loader seems more just another way to load json objects, rather than being a database that can be queried
Yea pretty much. Like with the sql index (or pandas index), you ask a question and it writes the sql to query the database (or pandas code lol)

You can checkout the sandbox + code I have here if you are curious

https://huggingface.co/spaces/llamaindex/llama_index_sql_sandbox
nice! Will take a look. A lot of this schema setup and indexing an probably be handled by some introspection process too i think πŸ€”
at some point it’ll be important to optimise that index so it can consider the database indexing rules and column/table structure in terms of using more optimized queries and joins rather than the basic ones there right now?
Yea the table context I give in my demo is optional. Without it, it just reads the table schema on its own

But it does write joins on its own! Seems to me the LLMs just need to get better at sql lol
Add a reply
Sign up and join the conversation on Discord