Find answers from the community

Updated 2 years ago

Token usage

At a glance
Hello again, still trying to estimate the costs, I see some strange things,

This is my code:

(Only called first time):
Plain Text
def train(path):
    tokens = 0
    name = path.split("/")[-1]

    # get the documents inside the folder
    documents = SimpleDirectoryReader(path).load_data()
    print("Starting Vector construction at ", datetime.datetime.now())
    index = GPTSimpleVectorIndex.from_documents(documents)

    index.save_to_disk("indexes/" + name + ".json")

    return tokens


Now, I just call this another method:
Plain Text
def query(query, toIndex):
    index = GPTSimpleVectorIndex.load_from_disk("indexes/" + toIndex + ".json")
    response = index.query(query)
    return response

response = query("question", "data")


This is what the console output says:
Plain Text
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 5002 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 23 tokens


But this is what OpenAI billing console says:
Plain Text
11:35
Local time: 30 mar 2023, 13:35
text-davinci, 2 requests
4,483 prompt + 512 completion = 4,995 tokens
11:35
Local time: 30 mar 2023, 13:35
text-embedding-ada-002-v2, 2 requests
56,906 prompt + 0 completion = 56,906 tokens


is that right? πŸ€”
L
S
15 comments
It looks like openAI maybe clumped together the query and index construction tokens?
Yes, but it should be something because I see it for all queries
So something is wrong on my side or... Strange
Does load_from_disk make LLM calls?
Load from disk does not make any calls πŸ€”
It looks like to me 11:45 was when you called query, although it's weird the the LLM usage showed up at 11:35 instead
I'm not sure how reliable those openai logs are though πŸ˜…
The problem is that I've just made 4-5 calls and I get 0.10cent per "query" (more or less), I'm just trying with one document
Plain Text
# Import necessary packages

from llama_index import (
    GPTSimpleVectorIndex,
    SimpleDirectoryReader,
    GPTSimpleVectorIndex
)

import os
import datetime

os.environ['OPENAI_API_KEY'] = 'API_KEY'

# index = GPTKeywordTableIndex(doeDocuments)
# index.save_to_disk("doe_index.json")

def generateIndex(path):
    tokens = 0
    name = path.split("/")[-1]

    # get the documents inside the folder
    documents = SimpleDirectoryReader(path).load_data()
    print("Starting Vector construction at ", datetime.datetime.now())
    index = GPTSimpleVectorIndex.from_documents(documents)

    index.save_to_disk("indexes/" + name + ".json")

    return tokens
    
def query(query, toIndex):
    index = GPTSimpleVectorIndex.load_from_disk("indexes/" + toIndex + ".json")
    response = index.query(query)
    return response

response = query("ΒΏQuΓ© subvenciones se publicaron en el DOE del 28/03/2023?", "data")
print(response) 
The full code, I don't see anything strange here, but maybe it's time to go back to Android/IOS development hahaah
Oh man, that sounds 10x worse not gonna lie 🀣

I think most of the cost is coming from the LLM. You can try using chatgpt to reduce the cost by 10x. You can also lower the chunk size so that the top_k matching nodes are smaller (by default, the top_k is 1)

Here's am example that has both of those concepts:
https://github.com/jerryjliu/llama_index/blob/main/examples/vector_indices/SimpleIndexDemo-ChatGPT.ipynb
With chatgpt I get 0.04€ per query, not bad
Nice! πŸ’ͺπŸ’ͺ
Add a reply
Sign up and join the conversation on Discord