mathada

·

Costs

Hello,
I had a little confusion over pricing while using llama-index.
Currently, I think with each request - I am getting charged for "text-davincii" which is $0.02/1000 tokens and "text-embedding-ada-002-v2" which is $0.0004/1000 tokens. I had given a research document pdf as input. I was charged $0.1 for one request. (Roughly, 4000 tokens for text-davincii and 16000 tokens for "text-embedding-ada-002-v2"?

Will I be charged for embedding for each request?
Also, if I use "gpt-3.5-turbo" - will it be a better option?
Finally, how to reduce the number of tokens used so that I will be charged less. One approach which I am thinking of is to use vector search engine (like qdrant or faiss) to cache "questions asked and responses given by GPT". So, if a similar question is asked by any user next time, answer can be given directly by cache instead of using tokens.
Any help would be appreciated. Thank you.

19 comments

B

R

m

L

mmathada

·

Top k chunks

Hello all,
Is there a way to print only the top k chunks based on user query using llama index?

As far as my understanding, llama index creates an embedding of the user query and finds the top most similar <similarity_top_k> chunks (using cosine similarity). Is there a way to print what these top k chunks are?

8 comments

L

m

s

i

Find answers from the community

Costs

Top k chunks