Find answers from the community

Updated 5 months ago

Hello

At a glance

The community member has two questions:

1. The "Embeddings" documentation mentions the default model text-embedding-ada-002 can be used for both text search and similarity. The community member asks if there are any examples or tutorials on how to use it for similarity, specifically for anomaly detection.

2. When using embeddings for Q&A, the community member asks what the actual query is that is sent to GPT-3 along with the matching documents retrieved from the index.

In the comments, another community member provides some guidance:

1. For similarity and anomaly detection, they suggest setting the response_mode to "no_text" during the index.query, adjusting the similarity_top_k, and parsing the response object to obtain the source nodes.

2. For the Q&A query, they point the community member to the QuestionAnswerPrompt reference in the documentation.

However, there is no explicitly marked answer to the community member's questions.

Useful resources
Hello,
First I would like to thank and all other contributors who created and maintain this great library!
I have two questions:
  1. In the "Embeddings" documentation it says that the default model is text-embedding-ada-002 which can be used for both text search and similarity. Are there any examples / tutorials on how to use it for similarity, or more specifically anomaly detection?
  2. When using embedding for Q&A, what is the actual query that's being sent to GPT-3 along with the matching documents retrieved from the index?
TIA!
j
y
3 comments
re: 1) you can set response_mode="no_text" during index.query, and adjust the similarity_top_k. Then you can parse the response object to obtain the source nodes, if you just want to fetch the underlying documents by similarity (see this section https://gpt-index.readthedocs.io/en/latest/guides/usage_pattern.html#parsing-the-response)

re 2): the question answer prompt is here: https://gpt-index.readthedocs.io/en/latest/reference/prompts.html#gpt_index.prompts.prompts.QuestionAnswerPrompt
re 1 - I guess that for anomaly detection I need to cluster the embedding vectors and then within each cluster find the "outliars". Does this make sense? if so, do you have any suggestion how to implement it?
re 2 - It seems the default values are None, so how are the query + paragraphs being sent to GPT 3 by default ? i.e "based on the following paragraphs: <retrieved documents here> , <query here>"
Hi @jerryjliu0 , just a kind reminder for the above questions. TIA!
Add a reply
Sign up and join the conversation on Discord