Find answers from the community

Updated 4 months ago

Costs

At a glance

The post asks the community for strategies to mitigate a worry. The comments discuss the following: - Using token predictors to monitor and limit token usage when working with the OpenAI dashboard, which can be laggy and unreliable. - A link to documentation on cost analysis for the GPT-Index library. - Explanations of the differences between tree indexes and vector indexes, where tree indexes build a hierarchical summary of the data using the language model, while vector indexes embed the documents and use cosine similarity to find the most relevant ones. - Insights on the token usage when building a GPT-Index, where the language model is used to summarize the data and construct the tree index.

Useful resources
any strategies people here would recommend using to mitigate this worry?
L
n
18 comments
Yea I'm not really sure about the openai dashboard, it seems quite laggy/unreliable

To help remedy this, there are token predictors you can try.

For example, I know some people will run the token predictors and then only run something if the token usage is low enough
this is great thanks!
are there any gotchas or situations to avoid where you have run into unexpectedly higher costs than you imagined?
Not really that I can imagine πŸ€” usually, you have a pretty good idea of the costs ahead of time.

I.e. building a tree index might be slightly expensive, but queries will be cheaper than a list index.


Vector indexes are cheap to build (because embeddings are cheap). Queries are cheap too because it only looks at top_k nodes.

Basically, the main factor in cost is how much data you have in your index
what is the difference between a tree index and a vector index?
I'm going through the docs now page by page, but I just want to be sure I understand clearly.
A tree index builds a sort of tree of summaries using the LLM. Good for summarizing and wrapping other indexes into a composable index

A vector index embeds your documents, and then at query time only uses the documents (similarity_top_k=1 by default, so it picks the single top node) that best match the query (using cosine similarity)
so a tree index will chain actual API calls?
Plain Text
    documents = SimpleDirectoryReader(directory).load_data()
    llm_predictor = MockLLMPredictor(max_tokens=256)
    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
    index = GPTTreeIndex.from_documents(documents, service_context=service_context)

    print(llm_predictor.last_token_usage)
this returns ...
Plain Text
INFO:llama_index.indices.common.tree.base:> Building index from nodes: 4 chunks
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 17442 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens
where is that 17442 tokens number coming from?
I'm not passing a prompt or a question or anything here so I am confused
Internally it's calling the LLM to summarize information and basically construct a tree out of your documents πŸ‘
oh interesting, so each level up the tree is a summary of the two below?
is that a rough model of how it works?
Add a reply
Sign up and join the conversation on Discord