Hello

At a glance

The community member has two questions:

1. The "Embeddings" documentation mentions the default model text-embedding-ada-002 can be used for both text search and similarity. The community member asks if there are any examples or tutorials on how to use it for similarity, specifically for anomaly detection.

2. When using embeddings for Q&A, the community member asks what the actual query is that is sent to GPT-3 along with the matching documents retrieved from the index.

In the comments, another community member provides some guidance:

1. For similarity and anomaly detection, they suggest setting the response_mode to "no_text" during the index.query, adjusting the similarity_top_k, and parsing the response object to obtain the source nodes.

2. For the Q&A query, they point the community member to the QuestionAnswerPrompt reference in the documentation.

However, there is no explicitly marked answer to the community member's questions.

Useful resources

yyoelk

Hello,
First I would like to thank and all other contributors who created and maintain this great library!
I have two questions:

In the "Embeddings" documentation it says that the default model is text-embedding-ada-002 which can be used for both text search and similarity. Are there any examples / tutorials on how to use it for similarity, or more specifically anomaly detection?
When using embedding for Q&A, what is the actual query that's being sent to GPT-3 along with the matching documents retrieved from the index?

TIA!

3 comments

jjerryjliu0

re: 1) you can set response_mode="no_text" during index.query, and adjust the similarity_top_k. Then you can parse the response object to obtain the source nodes, if you just want to fetch the underlying documents by similarity (see this section https://gpt-index.readthedocs.io/en/latest/guides/usage_pattern.html#parsing-the-response)

re 2): the question answer prompt is here: https://gpt-index.readthedocs.io/en/latest/reference/prompts.html#gpt_index.prompts.prompts.QuestionAnswerPrompt

yyoelk

re 1 - I guess that for anomaly detection I need to cluster the embedding vectors and then within each cluster find the "outliars". Does this make sense? if so, do you have any suggestion how to implement it?
re 2 - It seems the default values are None, so how are the query + paragraphs being sent to GPT 3 by default ? i.e "based on the following paragraphs: <retrieved documents here> , <query here>"

yyoelk

Hi @jerryjliu0 , just a kind reminder for the above questions. TIA!

Add a reply

Find answers from the community

Hello