rn the schema is at about 130 lines, but it is probably going to be around 200-300?
The thing that is weird though is if i ask it like list all the ids that meet this condition it works, but when I ask it to tell me the exact number it gives this error
ngl not sure what's going on here haha I haven't actually used the json thingy yet
No idea why it's sending 40,573 tokens in a prompt π
But i feel like there's an error somewhere
Looking at the code, the following is sent to the LLM
- the query string
- a
json.dumps()
of the json_schema
- A small prompt template
Then once it predicts the json path, it optionally synthesizes the natural language response using
- A small prompt template
- the query string
- a
json.dumps()
of the json_schema
- the JSON path
- the result of retireving the text at the json path
My only guess is that part 5 in the second LLM call is somehow huge?
Or maybe you mixed up json_schema and json_value π€
some of the jsons are fairly complex with like 40 fields
the complexity should be fine. I just meant if some of the values in the JSON are very long . I'm wondering if the json path it is querying is resulting in a ton of tokens
If that's the case, I think that's not quite accounted for in the current implementation. You might be better off using SimpleDirectoryReader and throwing the data into a vector index lol
Everything is less than 60 characters
is it possible to share this json + schema? I'd be curious to step through with a debugger and figure out what's going on lol
I cannot share the json/schema but if you can tell me a possible good spot to place a debugger breakpoint, I can take a look
How intensive is it to host a LLM on my m1 mac and see if I can find a workaround by self hosting for a bit?
mmm self hosting won't solve this 40,000 token request LOL There's stuff like llama.cpp that's optimized to run on macs, but its pretty slow. Never tried it myself, but I know langchain has it
Highest I got it to was 461k tokens π
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 463174 tokens. Please reduce the length of the messages.
Two spots I would set a debugger
Option 1: If you can locate the llama-index installation/source code on your system, I would set a break point using pdb inside llama_index/indices/struct_store/json_query.py
, and right at line 103, in the _query() function
Option 2: Set a break point just before you run the query, and manually step into functions until you get to the above file lol
Basically, I would want to double check the two variables json_path_response_str
and json_path_output
to see whats in them/how big they are
@llm_token_counter("query")
def _query(self, query_bundle: QueryBundle) -> Response:
"""Answer a query."""
schema = self._get_schema_context()
(
json_path_response_str,
formatted_prompt,
) = self._service_context.llm_predictor.predict(
self._json_path_prompt,
schema=schema,
query_str=query_bundle.query_str,
)
if self._verbose:
print_text(f"> JSONPath Prompt: {formatted_prompt}\n")
print_text(
f"> JSONPath Instructions:\n" f"
\n{json_path_response_str}\n
\n"
)
json_path_output = self._output_processor(
json_path_response_str,
self._json_value,
**self._output_kwargs,
)
if self._verbose:
print_text(f"> JSONPath Output: {json_path_output}\n")
if self._synthesize_response:
response_str, _ = self._service_context.llm_predictor.predict(
self._response_synthesis_prompt,
query_str=query_bundle.query_str,
json_schema=self._json_schema,
json_path=json_path_response_str,
json_path_value=json_path_output,
)
else:
response_str = json.dumps(json_path_output)
response_extra_info = {
"json_path_response_str": json_path_response_str,
}
return Response(response=response_str, extra_info=response_extra_info)
at the schema definition?
and from there, step line-by-line
response_str, _ = self._service_context.llm_predictor.predict(
self._response_synthesis_prompt,
query_str=query_bundle.query_str,
json_schema=self._json_schema,
json_path=json_path_response_str,
json_path_value=json_path_output,
)
I have a feeling it's this predict call thats barfing, one of these variables is huuuge for some reason
is it not possible to use vscode debugger on packages?
hmmm not sure. Personally I always use pdb
lol
could also do option 2 from above, it will just be a little tedious to drill down to the file
json_path_response_str
seems relatively short...
json_path_output
the length is 905
And that raises a token error?
Nothing about that should be using that many tokens...
I think it has to do with the fact that the formatted_prompt
is like 10k characters
because the json_path_output
array is 905 items long, but that is just a hunch
Oh when you said 905 I thought you meant 905 characters π
I wonder how many it would have for that question that was 463k tokens
Well that looks like the root of the problem, seems like engine wasnt designed to expect something that large π
I can file a bug for this internally
Alright. Do you have any suggestions for an alternative? This project is for a hackathon whose deadline is tomorrow 11;59
Try loading the json using SimpleDirectoryReader and throw it into a vector index
documents = SimpleDirectoryReader("./data_dir").load_data()
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
query_engine = index.as_query_engine(similarity_top_k=2)
response = query_engine.query("query")
Be aware the embeddings cost $0.0004/1k tokens lol
Cheap, but also not sure how big your file is
atm its like 280k lines π
Well, words/tokens matter more than lines
And similar to that guide, but no need to use pinecone
That is more what I was thinking
It should work with json?
Should work well enough. I think it well largely depend on the kinds of queries you are making though too
If you have a smaller json to test with, try that first lol
I'm trying to at least for now get basic information on different items within that json
Do I need to provide a schema?
Is it as powerful of a tool as the json query is then?
It's a different kind of power
It will make embeddings for different parts of your json, and then also embed your queries, and use vector similarity to get the top k matching chunks of the json
Then it uses those matching chunks to write and answer
Is it possible to do something similar to this but with the vector query?
service_context = ServiceContext.from_defaults(llm_predictor=LLMPredictor(llm=ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0.5)))
is it in the storagecontext this time?
Nope, still service context, just gotta pass it in
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
π. Seems a lot slower but hopefully doesn't have any issues :Peepo_Shrug:
However, the response was not correct π
Ooof bruh we should have counted the tokens before running. Is this in a notebook? We should save the index before continuing
I wonder if the types of requests i have to type into the data need to be changed
its hardcapped at $3 on my acc rn
Lol to save/load the index and save the embedding tokens you can do this
# SAVE
index.storage_context.persist(persist_dir="./storage")
oh because they wont change right?
as long as the data doesn't change?
oh wait, I already have that
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(
documents,
service_context=ServiceContext.from_defaults(llm_predictor=LLMPredictor(llm=ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0.5)))
)
index.set_index_id("vector_index")
index.storage_context.persist(persist_dir="./storage")
storage_context = StorageContext.from_defaults(persist_dir="storage")
index = load_index_from_storage(storage_context, index_id="vector_index")
query_engine = index.as_query_engine()
Oh good! Now we don't have to use 4.14m tokens again lol
it looks like that isn't working
the first request was a mistake
its weird because for some reason the answers Im getting are like completely wrong, so I wonder if the way that I query data needs to be changed...
storage_context = StorageContext.from_defaults(persist_dir="storage")
index = load_index_from_storage(storage_context, index_id="vector_index")
Doing just this shouldn't spent tokens if the index was saved
For the query, what type of response are you getting? Just like wrong facts or?
You can try adjusting the top k
for instance I was trying to find all ids of users whose profile was not setup in the dot notation structure users[index].profile_setup_completed
and it is just giving completely wrong data
yeah no it looks like it was continuing to spend tokens, maybe is it because the query was different? (data was the same but query was different)
Yea it will spend tokens to gpt-3.5 and a very small amount to text-ada-002
But also, that type of query probably won't work well for vector search π
Option 3: convert your json to a SQL database and use the sql index. Does that sound possible or nah?
Otherwise, i think we are running out of options for this use case π
Is there another way to do that sort of query?
by tokens I mean the embedding tokens. every time I made a query
Maybe if you set the top k to something like 5 and set response mode to tree summarize?
index.as_query_engine(similarity_top_k=5, response_mode="tree_summarize")
?
Hmmm... and you used a vector index right? And it's using 2m embedding tokens per query? Bro you finding all the issues today LOL
Might have to look into another strategy π
I think so. Sorry man π
What if I used the SQL query engine? Is it as smart/smarter than the langchain sql chain stuff?