The post asks about the 'custom_embedding_strs' feature in the new querybundle feature. Community members explain that this feature allows modifying the string used for calculating the embedding of the query, which is separate from the string used for the final LLM output. The main use case is for HyDE (hypothetical document embeddings). Community members also discuss how the embeddings are calculated and aggregated, and provide suggestions for reducing API token costs when using GPT Index for a legal assistant application.
Right now the main use-case for this feature is to support HyDE (hypothetical document embeddings). You can take a look at this tweet thread for more explanation/examples: https://twitter.com/jerryjliu0/status/1626255140209717248
Currently the default logic is to embed each separately, and use the "mean" embedding for calculating similarity.
We support customizing the aggregation function from "mean" to something else. But that configuration is not exposed at the Index API level yet. It's possible to subclass BaseEmbedding to do implement your desired behavior though.
so in the Hyde twitter thread example, is the embeddings_strs[0] equivalent to custom_embedding_strs[0] - and hyde is hallucinating context to pass in for #1/k-nearest retreival?
I'm trying to create a legal assistant question & answer on ~1 million cases/legislation documents using GPT Index. e.g. "Summarize this law & cite relevant cases"
Any insight on what tools/classes to use to get the best answers per api token spend? e.g. GPTSimpleVectorIndex in combination with ___
I think we use Davinci for LLM call by default, which costs 0.0200 /β1K tokens. I'd recommend trying a cheaper LLM model and see if the quality is still acceptable.