Costs

At a glance

The post asks the community for strategies to mitigate a worry. The comments discuss the following: - Using token predictors to monitor and limit token usage when working with the OpenAI dashboard, which can be laggy and unreliable. - A link to documentation on cost analysis for the GPT-Index library. - Explanations of the differences between tree indexes and vector indexes, where tree indexes build a hierarchical summary of the data using the language model, while vector indexes embed the documents and use cosine similarity to find the most relevant ones. - Insights on the token usage when building a GPT-Index, where the language model is used to summarize the data and construct the tree index.

Useful resources

nnbulkz

any strategies people here would recommend using to mitigate this worry?

18 comments

LLogan M

Yea I'm not really sure about the openai dashboard, it seems quite laggy/unreliable

To help remedy this, there are token predictors you can try.

For example, I know some people will run the token predictors and then only run something if the token usage is low enough

LLogan M

https://gpt-index.readthedocs.io/en/latest/how_to/analysis/cost_analysis.html

nnbulkz

this is great thanks!

nnbulkz

are there any gotchas or situations to avoid where you have run into unexpectedly higher costs than you imagined?

LLogan M

Not really that I can imagine 🤔 usually, you have a pretty good idea of the costs ahead of time.

I.e. building a tree index might be slightly expensive, but queries will be cheaper than a list index.

Vector indexes are cheap to build (because embeddings are cheap). Queries are cheap too because it only looks at top_k nodes.

Basically, the main factor in cost is how much data you have in your index

nnbulkz

what is the difference between a tree index and a vector index?

nnbulkz

I'm going through the docs now page by page, but I just want to be sure I understand clearly.

LLogan M

A tree index builds a sort of tree of summaries using the LLM. Good for summarizing and wrapping other indexes into a composable index

A vector index embeds your documents, and then at query time only uses the documents (similarity_top_k=1 by default, so it picks the single top node) that best match the query (using cosine similarity)

nnbulkz

so a tree index will chain actual API calls?

nnbulkz

Plain Text

    documents = SimpleDirectoryReader(directory).load_data()
    llm_predictor = MockLLMPredictor(max_tokens=256)
    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
    index = GPTTreeIndex.from_documents(documents, service_context=service_context)

    print(llm_predictor.last_token_usage)

nnbulkz

this returns ...

nnbulkz

Plain Text

INFO:llama_index.indices.common.tree.base:> Building index from nodes: 4 chunks
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 17442 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens

nnbulkz

where is that 17442 tokens number coming from?

nnbulkz

I'm not passing a prompt or a question or anything here so I am confused

LLogan M

Internally it's calling the LLM to summarize information and basically construct a tree out of your documents 👍

nnbulkz

oh interesting, so each level up the tree is a summary of the two below?

nnbulkz

is that a rough model of how it works?

LLogan M

yup exactly!

Add a reply

Find answers from the community

Costs