Token usage

At a glance

Hello again, still trying to estimate the costs, I see some strange things,

This is my code:

(Only called first time):

Plain Text

def train(path):
    tokens = 0
    name = path.split("/")[-1]

    # get the documents inside the folder
    documents = SimpleDirectoryReader(path).load_data()
    print("Starting Vector construction at ", datetime.datetime.now())
    index = GPTSimpleVectorIndex.from_documents(documents)

    index.save_to_disk("indexes/" + name + ".json")

    return tokens

Now, I just call this another method:

Plain Text

def query(query, toIndex):
    index = GPTSimpleVectorIndex.load_from_disk("indexes/" + toIndex + ".json")
    response = index.query(query)
    return response

response = query("question", "data")

This is what the console output says:

Plain Text

INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 5002 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 23 tokens

But this is what OpenAI billing console says:

Plain Text

11:35
Local time: 30 mar 2023, 13:35
text-davinci, 2 requests
4,483 prompt + 512 completion = 4,995 tokens
11:35
Local time: 30 mar 2023, 13:35
text-embedding-ada-002-v2, 2 requests
56,906 prompt + 0 completion = 56,906 tokens

is that right? 🤔

15 comments

LLogan M

It looks like openAI maybe clumped together the query and index construction tokens?

SSergio Casero

Yes, but it should be something because I see it for all queries

SSergio Casero

Attachment

SSergio Casero

So something is wrong on my side or... Strange

SSergio Casero

Does load_from_disk make LLM calls?

LLogan M

Load from disk does not make any calls 🤔

LLogan M

It looks like to me 11:45 was when you called query, although it's weird the the LLM usage showed up at 11:35 instead

LLogan M

I'm not sure how reliable those openai logs are though 😅

SSergio Casero

hahaha

SSergio Casero

The problem is that I've just made 4-5 calls and I get 0.10cent per "query" (more or less), I'm just trying with one document

SSergio Casero

Plain Text

# Import necessary packages

from llama_index import (
    GPTSimpleVectorIndex,
    SimpleDirectoryReader,
    GPTSimpleVectorIndex
)

import os
import datetime

os.environ['OPENAI_API_KEY'] = 'API_KEY'

# index = GPTKeywordTableIndex(doeDocuments)
# index.save_to_disk("doe_index.json")

def generateIndex(path):
    tokens = 0
    name = path.split("/")[-1]

    # get the documents inside the folder
    documents = SimpleDirectoryReader(path).load_data()
    print("Starting Vector construction at ", datetime.datetime.now())
    index = GPTSimpleVectorIndex.from_documents(documents)

    index.save_to_disk("indexes/" + name + ".json")

    return tokens
    
def query(query, toIndex):
    index = GPTSimpleVectorIndex.load_from_disk("indexes/" + toIndex + ".json")
    response = index.query(query)
    return response

response = query("¿Qué subvenciones se publicaron en el DOE del 28/03/2023?", "data")
print(response)

SSergio Casero

The full code, I don't see anything strange here, but maybe it's time to go back to Android/IOS development hahaah

LLogan M

Oh man, that sounds 10x worse not gonna lie 🤣

I think most of the cost is coming from the LLM. You can try using chatgpt to reduce the cost by 10x. You can also lower the chunk size so that the top_k matching nodes are smaller (by default, the top_k is 1)

Here's am example that has both of those concepts:
https://github.com/jerryjliu/llama_index/blob/main/examples/vector_indices/SimpleIndexDemo-ChatGPT.ipynb

SSergio Casero

With chatgpt I get 0.04€ per query, not bad

LLogan M

Nice! 💪💪

Add a reply

Find answers from the community

Token usage