From Kapa: "If you want to know which metadata is automatically selected and used in generating the response, you would need to modify the VectorIndexAutoRetriever or the underlying LlamaIndex code to return this information."
Finally, for one example query, I counted ~5k tokens for the above prompt chain ($0.01 with turbo). However, the final cost seemed to be in the $0.30-$0.40 range. Any idea what I'm missing from the final token count?
We're using an internal tool to assess various open source LLMs against GPT-3.5. Is there a way to retrieve the exact prompt / prompt chain that was fed to OpenAI via llama_index (like the stuff you see when verbose is set to True and the logger is set to DEBUG)? This way we can create a test set for comparison.
Is there any way to ensure certain nodes always appear in a vector-search query? For example, in call transcripts, sometimes you want the first and last chunks to be included.
Any examples of setting up parent/child node relationships? Follow-up question: does setting prev, next, child, parent relationships affect tree index construction?