Find answers from the community

Updated 3 months ago

hi, i have a question if im on the right

hi, i have a question if im on the right path: I have a large json array, this set I want to query with NL questions. I build a prototype with VectorStoreIndex which worked well until I put in all the json data. Then it got really slow and each query takes more then 8 minutes. Am I on the right path or should I focus on another solution for the problem?
T
M
7 comments
Did you log which part was taking that much? At least the setup I've been using for JSONs is parsing each object into their own node and taking each JSON field and making it a metadata field (using VectorStoreIndex here). That approach has been working really well and is quick.
Will try it that way once more. What’s the best way to enable logs?
You can do
Plain Text
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
will retry and come back here, thanks for your assistance 🙂
Now I did the following:
documents = JSONReader(is_jsonl = True).load_data("data.jsonl")
index = VectorStoreIndex.from_documents(documents)

So i see in the log that now each line in my jsonl is considered as a node. how to add the metadata from each json field into it? Is there any example somewhere?
Second thing which made my implementation slow: I loaded the VectorStoreIndex each time, now i keep my console running and load VectorStoreIndex only once. I think this was the main performance improvement
Add a reply
Sign up and join the conversation on Discord