How do you image querying the index?
For example, if you have a database of discord chats, you might do something like this
response = index.query("What did <user_name> say about <topic>>", similarity_top_k=5)
This will find the 5 closest chats that best match the query (using cosine similarity), and then ask the question to the LLM using those chats as context
If you want to return the id, you could skip the LLM and just return the response.source_nodes
, or create a custom q/a prompt to get the LLM to do it for you
With ~40,000 documents, you might be interested in using a 3rd party vector store database (pinecone, etc.) so that you don't need to keep all the embeddings in memory
So I wanted to use LLama Index for something I'd normally do manually.. I have chat logs in a database. Each log entry is something someone said in a discord. I stripped their username/ids. So I have timestamps, id and content. I simply want to identify all ids that are questions so I can see trending questions in my various discords. Normally I'd just chunk all the chat logs, pass them to GPT-3, and create a prompt with something along the lines of:
"Below are chat logs. Each chat log is prefixed with an id: following an excerpt of chat. eg: 5532133: How do I find information on GPTSimpleVectorIndex?
or 2223233: Hope everyone is doing well
, please return a list of IDs for every log that you believe is a question.
In this case it would return a single id (5532133) that I can then go retrieve from my database.
So I was hoping to index everything, then simply add the prompt and get tuples back -- not sure if it'll work cause still indexing. Is this the wrong approach you think?
(by manually I mean still automated with OpenAI API, just no other fancy tools)
ohhh very cool use case!
In this case, it sounds like you need to check every single discord message.
So, I would use a GPTListIndex instead. It doesn't use embeddings, and instead just goes over every single node. Here's a full example
from gpt_index.prompts.prompts import QuestionAnswerPrompt, RefinePrompt
DEFAULT_TEXT_QA_PROMPT_TMPL = (
"Below is a list of user messages, prefixed with an ID number: \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Given the context information and not prior knowledge, "
"return the ID numbers that satisfy the query: {query_str}\n"
)
query_prompt_template = QuestionAnswerPrompt(DEFAULT_TEXT_QA_PROMPT_TMPL)
DEFAULT_REFINE_PROMPT_TMPL = (
"The original query is as follows: {query_str}\n"
"We have provided an existing list of message ids: {existing_answer}\n"
"We have the opportunity to add more message ids "
"(only if needed) with some more context below.\n"
"------------\n"
"{context_msg}\n"
"------------\n"
"Given the context information and not prior knowledge, "
"add more ID numbers to the original list that satisfy the query."
"If the context isn't useful, return the original list of IDs."
)
refine_template = RefinePrompt(DEFAULT_REFINE_PROMPT_TMPL)
index = GPTListIndex(documents)
response = index.query("Find messages that are asking a question", text_qa_template=query_prompt_template, refine_template=refine_template, response_mode="compact")
You might also have to use a prompt helper to change how many tokens the model is expected to output.
(wow this answer got out of hand hahaha sorry about that)
response_mode="compact" will stuff as many nodes as possible in each LLM call
very cool, thank you for this~ do you have a tip jar? Listindex sounds more appropriate.
also is there any reason I'd want to stuff as many nodes as possible with a compact call, or rather, why that wouldn't be a default? is there a downside?
By not using compact, you will get more detailed answers. In this case, you dont need details, just the numbers π
No tip jar, but thanks anyways! Just happy to help for now πͺ
thanks Logan, appreciate this. Going to play with it right now.
I was considering using the chatgpt endpoint to save on costs, that's the one thing I'll probably look into modifying above
ChatGPT exposes a slightly different interface, with the whole "roles" thing
yeah, that's also true. this operation shouldn't cost too much. let's get it working with davinci-003 first and then move onto optimizing for cost .. great points, though
Quick question for you, is RefinePrompt when we anticipate the response being concatenated over several queries to OpenAI GPT3?
Trying to figure out which of the prompts I need to modify as it is returning back some logs that are not questions
maybe a refine isn't even needed @Logan M ?
The refine is what is used after the first call
So, if you all your nodes don't fit into one call, then it switches to the refine prompt
I'm assuming 40,000 messages won't fit into one call? π
ah, ok. I need to figure out how to debug, such as save each query to a txt file or something. it's returning a bunch of chat logs that aren't questions. but as soon as I comment out text_qa_template
and refine_template
it only returns questions
but I'm assuming it's also now missing questions if they were silently truncated
didn't get complaints that things didn't fit (Just doing 1,000 at a time for now to test)
1,000 should still be more than what fits into a single prompt (4,000 tokens)
Hmmm interesting. If it works with default prompts that's a plus!
Would Llama complain if any of the prompt was truncated? or would it just simply return the last chunk's results and discard the rest silently?
Internally, it works to make sure nothing is ever truncated
The call to the llm always respects the max token limit (4096)
so I'm trying to understand if i am losing anything by omitting text_qa_template
and refine_template
, cause then it seems to just work
I think one issue is you are relying on the LLM to copy and paste its answers to build the list. Which you are right it might randomly omit things
Even with custom prompts, it might be an issue.
Maybe a better solution would be making a list of ListIndex's, so that each list index has just enough text to fit into one LLM call
Then, the refine prompt (the default one or one you define) is never called
That way, you can keep track of the returned id's yourself, rather than trusting the model to build the list across calls
Hmm. that might be interesting. I'd need to figure out how to do this part: making a list of ListIndex's, so that each list index has just enough text to fit into one LLM call
then I can just loop through them and concat the results from each list myself
if my listindex cannot fit into one llm call, what is happening with previous responses from the LLM in this case:
response = index.query("Return only the chat logs that are a question. If you are unsure if a chat log is a question, please do not return it.", response_mode="compact")
print(response)
is it only returning the response from the last LLM call?
It's tough to say haha
After the first prompt, it can see the previous answer. But it's hard to say if it will chose to add to the existing list or replace it when it gives the second response
Ok so two final questions and I'll go experiment some more:
1) Is there a way to build out my ListIndexes to ensure that the content will fit in one LLM call?
2) Is there a rule for when to use text_qa_template
and refine_template
?
If you can split your discord logs into chunks of approximately 1000-2000 words, you should be safe for this π
2) There are internal qa and refine templates that are defined. You only need to modify them when your use-case isn't working haha I know that's a vague answer, but in most cases I would trust the default prompts. This particular use-case is a little different though
I think in this particular use case, it might make sense to just use openai api wrappers directly and use something like tiktoken to build my prompt up to the right size
Yea you might be right. If instead you just wanted a general AI-Powered search of the discord chats, then llama_index would be a better fit. For now, the only advantage I see for this specific use case is it runs the API for you and loads your data lol
true true well I'll have plenty more useful stuff for llama index
hey @steve , did you use the DatabaseReader for making a index.json file from a db?