Find answers from the community

Updated 2 years ago

Would this framework work well if I was

At a glance
Would this framework work well if I was trying to build an AI to look through a JSON and answer questions regarding the data inside that file(s)?
t
L
100 comments
rn the schema is at about 130 lines, but it is probably going to be around 200-300?
The thing that is weird though is if i ask it like list all the ids that meet this condition it works, but when I ask it to tell me the exact number it gives this error
ngl not sure what's going on here haha I haven't actually used the json thingy yet
No idea why it's sending 40,573 tokens in a prompt πŸ˜… But i feel like there's an error somewhere
Looking at the code, the following is sent to the LLM

  1. the query string
  2. a json.dumps() of the json_schema
  3. A small prompt template
Then once it predicts the json path, it optionally synthesizes the natural language response using

  1. A small prompt template
  2. the query string
  3. a json.dumps() of the json_schema
  4. the JSON path
  5. the result of retireving the text at the json path
My only guess is that part 5 in the second LLM call is somehow huge?
Or maybe you mixed up json_schema and json_value πŸ€”
some of the jsons are fairly complex with like 40 fields
the complexity should be fine. I just meant if some of the values in the JSON are very long . I'm wondering if the json path it is querying is resulting in a ton of tokens
If that's the case, I think that's not quite accounted for in the current implementation. You might be better off using SimpleDirectoryReader and throwing the data into a vector index lol
Everything is less than 60 characters
is it possible to share this json + schema? I'd be curious to step through with a debugger and figure out what's going on lol
seems bonkers to me
I cannot share the json/schema but if you can tell me a possible good spot to place a debugger breakpoint, I can take a look
How intensive is it to host a LLM on my m1 mac and see if I can find a workaround by self hosting for a bit?
mmm self hosting won't solve this 40,000 token request LOL There's stuff like llama.cpp that's optimized to run on macs, but its pretty slow. Never tried it myself, but I know langchain has it
Highest I got it to was 461k tokens πŸ˜‚
Plain Text
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 463174 tokens. Please reduce the length of the messages.
Two spots I would set a debugger

Option 1: If you can locate the llama-index installation/source code on your system, I would set a break point using pdb inside llama_index/indices/struct_store/json_query.py, and right at line 103, in the _query() function

Option 2: Set a break point just before you run the query, and manually step into functions until you get to the above file lol
Basically, I would want to double check the two variables json_path_response_str and json_path_output to see whats in them/how big they are
Plain Text
    @llm_token_counter("query")
    def _query(self, query_bundle: QueryBundle) -> Response:
        """Answer a query."""
        schema = self._get_schema_context()

        (
            json_path_response_str,
            formatted_prompt,
        ) = self._service_context.llm_predictor.predict(
            self._json_path_prompt,
            schema=schema,
            query_str=query_bundle.query_str,
        )

        if self._verbose:
            print_text(f"> JSONPath Prompt: {formatted_prompt}\n")
            print_text(
                f"> JSONPath Instructions:\n" f"
\n{json_path_response_str}\n
Plain Text
\n"
            )

        json_path_output = self._output_processor(
            json_path_response_str,
            self._json_value,
            **self._output_kwargs,
        )

        if self._verbose:
            print_text(f"> JSONPath Output: {json_path_output}\n")

        if self._synthesize_response:
            response_str, _ = self._service_context.llm_predictor.predict(
                self._response_synthesis_prompt,
                query_str=query_bundle.query_str,
                json_schema=self._json_schema,
                json_path=json_path_response_str,
                json_path_value=json_path_output,
            )
        else:
            response_str = json.dumps(json_path_output)

        response_extra_info = {
            "json_path_response_str": json_path_response_str,
        }

        return Response(response=response_str, extra_info=response_extra_info)
at the schema definition?
yea right there!
and from there, step line-by-line
Plain Text
response_str, _ = self._service_context.llm_predictor.predict(
                self._response_synthesis_prompt,
                query_str=query_bundle.query_str,
                json_schema=self._json_schema,
                json_path=json_path_response_str,
                json_path_value=json_path_output,
            )


I have a feeling it's this predict call thats barfing, one of these variables is huuuge for some reason
is it not possible to use vscode debugger on packages?
hmmm not sure. Personally I always use pdb lol
could also do option 2 from above, it will just be a little tedious to drill down to the file
k i figured it out
what am I looking for?
json_path_response_str seems relatively short...
json_path_output the length is 905
And that raises a token error?
Nothing about that should be using that many tokens...
I think it has to do with the fact that the formatted_prompt is like 10k characters
because the json_path_output array is 905 items long, but that is just a hunch
Oh when you said 905 I thought you meant 905 characters πŸ˜…
905 elements
I wonder how many it would have for that question that was 463k tokens
Well that looks like the root of the problem, seems like engine wasnt designed to expect something that large πŸ˜… I can file a bug for this internally
Alright. Do you have any suggestions for an alternative? This project is for a hackathon whose deadline is tomorrow 11;59
Try loading the json using SimpleDirectoryReader and throw it into a vector index

documents = SimpleDirectoryReader("./data_dir").load_data()

index = VectorStoreIndex.from_documents(documents, service_context=service_context)

query_engine = index.as_query_engine(similarity_top_k=2)

response = query_engine.query("query")
Well, should be fine
Be aware the embeddings cost $0.0004/1k tokens lol
Cheap, but also not sure how big your file is
atm its like 280k lines πŸ’€
Well, words/tokens matter more than lines

And similar to that guide, but no need to use pinecone
That is more what I was thinking
It should work with json?
Should work well enough. I think it well largely depend on the kinds of queries you are making though too
If you have a smaller json to test with, try that first lol
I'm trying to at least for now get basic information on different items within that json
Do I need to provide a schema?
Is it as powerful of a tool as the json query is then?
It's a different kind of power

It will make embeddings for different parts of your json, and then also embed your queries, and use vector similarity to get the top k matching chunks of the json
Then it uses those matching chunks to write and answer
Is it possible to do something similar to this but with the vector query?
Plain Text
service_context = ServiceContext.from_defaults(llm_predictor=LLMPredictor(llm=ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0.5)))
is it in the storagecontext this time?
Nope, still service context, just gotta pass it in

index = VectorStoreIndex.from_documents(documents, service_context=service_context)
πŸ‘. Seems a lot slower but hopefully doesn't have any issues :Peepo_Shrug:
4.14m tokens πŸ˜‚
Attachment
image.png
However, the response was not correct 😐
Ooof bruh we should have counted the tokens before running. Is this in a notebook? We should save the index before continuing
I wonder if the types of requests i have to type into the data need to be changed
yikes
Attachment
image.png
Phew only a dollar
its hardcapped at $3 on my acc rn
Lol to save/load the index and save the embedding tokens you can do this

Plain Text
# SAVE
index.storage_context.persist(persist_dir="./storage")
oh because they wont change right?
as long as the data doesn't change?
oh wait, I already have that
Plain Text
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(
  documents,
  service_context=ServiceContext.from_defaults(llm_predictor=LLMPredictor(llm=ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0.5)))
  )

index.set_index_id("vector_index")
index.storage_context.persist(persist_dir="./storage")

storage_context = StorageContext.from_defaults(persist_dir="storage")
index = load_index_from_storage(storage_context, index_id="vector_index")

query_engine = index.as_query_engine()
Oh good! Now we don't have to use 4.14m tokens again lol
it looks like that isn't working
the first request was a mistake
Attachment
image.png
its weird because for some reason the answers Im getting are like completely wrong, so I wonder if the way that I query data needs to be changed...
Plain Text
storage_context = StorageContext.from_defaults(persist_dir="storage")
index = load_index_from_storage(storage_context, index_id="vector_index")


Doing just this shouldn't spent tokens if the index was saved

For the query, what type of response are you getting? Just like wrong facts or?
You can try adjusting the top k
for instance I was trying to find all ids of users whose profile was not setup in the dot notation structure users[index].profile_setup_completed and it is just giving completely wrong data
yeah no it looks like it was continuing to spend tokens, maybe is it because the query was different? (data was the same but query was different)
Yea it will spend tokens to gpt-3.5 and a very small amount to text-ada-002

But also, that type of query probably won't work well for vector search πŸ˜…
Option 3: convert your json to a SQL database and use the sql index. Does that sound possible or nah?

Otherwise, i think we are running out of options for this use case πŸ˜…
Is there another way to do that sort of query?
by tokens I mean the embedding tokens. every time I made a query
Maybe if you set the top k to something like 5 and set response mode to tree summarize?

index.as_query_engine(similarity_top_k=5, response_mode="tree_summarize") ?
Hmmm... and you used a vector index right? And it's using 2m embedding tokens per query? Bro you finding all the issues today LOL
Yeah ig so 😐
Might have to look into another strategy 😐
I think so. Sorry man πŸ˜…
Tough problem
What if I used the SQL query engine? Is it as smart/smarter than the langchain sql chain stuff?
It'll be comparable to the langchain SQL stuff I think. Main advantage is the recent addition to combine SQL and Vectors, which is kinda neat

https://gpt-index.readthedocs.io/en/latest/examples/query_engine/SQLAutoVectorQueryEngine.html
Add a reply
Sign up and join the conversation on Discord