Find answers from the community

Home
Members
mathada
m
mathada
Offline, last seen 3 months ago
Joined September 25, 2024
m
mathada
·

Costs

Hello,
I had a little confusion over pricing while using llama-index.
Currently, I think with each request - I am getting charged for "text-davincii" which is $0.02/1000 tokens and "text-embedding-ada-002-v2" which is $0.0004/1000 tokens. I had given a research document pdf as input. I was charged $0.1 for one request. (Roughly, 4000 tokens for text-davincii and 16000 tokens for "text-embedding-ada-002-v2"?

Will I be charged for embedding for each request?
Also, if I use "gpt-3.5-turbo" - will it be a better option?
Finally, how to reduce the number of tokens used so that I will be charged less. One approach which I am thinking of is to use vector search engine (like qdrant or faiss) to cache "questions asked and responses given by GPT". So, if a similar question is asked by any user next time, answer can be given directly by cache instead of using tokens.
Any help would be appreciated. Thank you.
19 comments
B
R
m
L
Hello all,
Is there a way to print only the top k chunks based on user query using llama index?

As far as my understanding, llama index creates an embedding of the user query and finds the top most similar <similarity_top_k> chunks (using cosine similarity). Is there a way to print what these top k chunks are?
8 comments
L
m
s
i