Hello everyone, I need some input to understand how feasible is a personal project I wanted to start. I have 10 years worth of personal journaling and wanted to index them and query upon them. The data is plain text (but I can structure it somehow) in a couple of files (each by journal category). I've tried a TreeIndex, but due to a coding mistake after generating it I didn't manage to save it and lost it, it costed me around 14β¬ of API tokens (was 250~ chunks of data). And I also noticed that querying it was really expensive... does the cost of a query scale app rather quickly depending on the size of the index?
It still was REALLY expensive, I did a Simple VectorIndex over just the last couple of years of journal, it was around 50 chunks, and it was around 1β¬ per query plus 5β¬ to build the vector, is this normal pricing or it is optimizable?
Constructing a vector index should be pretty cheap. It's something like $0.0004/1k tokens (1k tokens is about 600 words or so I think)
Query cost is dependent on the top_k and how big each chunk is. I'm pretty surprised a vector index query was $1, unless you had top_k set to a really big number maybe?
maybe it was the way data was structured? I basically picked up from where I left on the graham lee example, didn't add parameters...perhaps only the "mode='three_summarize'" but I'm not sure. Also, I did read it was 0.02/1k token (and it more or less compares with how much I paid), maybe I should switch the model?