I'm having some trouble trying to connect llama index to a simple json dataset. To test out, I've just taken a dump of various orders from my database into a flat json file with each order being a single flat json object (no nested elements, no arrays, etc.). I've loaded it in using loader = JSONReader() documents = loader.load_data(Path('./orders.json')) index = GPTSimpleVectorIndex.from_documents(documents)
but even simple queries to this index seem to get confused like "Get details of order number 12345", it fetches seemingly random details. Am I using the wrong index type for this?
Yea for hyper specific queries, I don't think a vector index would work well.
Maybe a keyword index is a better choice here? Or more specific preprocessing that parses your json into distinct documents and increasing the top_k a bit
For this type of query, You'd almost be better off using an LLM call to extract the order number from the text, using it to get the details from the JSON, and then constructing a list index on the fly to answer the query. But that's also a little complex lol
Definitely! The only tricky thing with the sql index is right now it just returns raw sql results, rather than creating a natural language response π
interesting, im tryign to work out both a read and write process for this, so potentially a different model for the translation/NL part would make sense then so it can be a consistent logic.
Seems the SQL index is kind of unique then in being a deterministic output? The mongodb loader seems more just another way to load json objects, rather than being a database that can be queried
at some point itβll be important to optimise that index so it can consider the database indexing rules and column/table structure in terms of using more optimized queries and joins rather than the basic ones there right now?