Find answers from the community

Updated 2 years ago

How do I use chatgpt model to answer

How do I use chatgpt model to answer questions from a huge list of articles?

When I use the davinci model it actually answers my questions but when I tried to use chatgpt model it describes me the article where the information is in.

All the articles are saved as separate txt files and loaded with SimpleDirectoryReader if that is important.

index = GPTSimpleVectorIndex.load_from_disk('index.json') llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="gpt-3.5-turbo")) response = index.query( "How to plan a wedding?", llm_predictor=llm_predictor )

"The article discusses how to plan a wedding using Allseated, an all-in-one digital platform for organizing wedding ideas, including creating floor plans, seating charts, and guest lists."
L
E
i
15 comments
This solution produces the same answer as above.
It's like for some reason it is describing the article it takes information from instead of answering the question. @Logan M
Yea, ChatGPT does that sadly 😦 The initial q/a prompts were designed for davinci-003. There's a new refine prompt for chatgpt, but that only matters when similarity_top_k is greater than the default of 1

I would try modifying the qa prompt if you are set on using chatgpt. Here's an example using the default prompt

Plain Text
from gpt_index.prompts.prompts import QuestionAnswerPrompt

DEFAULT_TEXT_QA_PROMPT_TMPL = (
    "Context information is below. \n"
    "---------------------\n"
    "{context_str}"
    "\n---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the question: {query_str}\n"
)
query_prompt_template = QuestionAnswerPrompt(DEFAULT_TEXT_QA_PROMPT_TMPL)

...

index.query("blah", text_qa_template=text_qa_template=query_prompt_template)
When you modify the prompt, make sure you keep the {context_str} and {query_str} variables

If you find a good general-use prompt, feel free to share it!
Is there a way of building the custom prompt like it is shown in the OpenAI documentation with roles?

messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Who won the world series in 2020?"}, {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."}, {"role": "user", "content": "Where was it played?"} ]

@Logan M
@Erik if you use the ChatGPTLLMPredictor class, it looks like you can set those using the prepend_messages argument. I haven't had a chance to try this out yet though https://github.com/jerryjliu/gpt_index/blob/main/gpt_index/langchain_helpers/chatgpt.py
Although setting that is like setting a static "prompt" i.e. maybe instructuions for how chatgpt should act

Then, future predictions follow the typical format you noted there
Thanks for that. I am actually not set on using the chatgpt but since it's cheaper and in theory should be better I was really hoping to use that. But for now I guess I will go with the davinci until I figure out how to get it to work with chatgpt.

On a different note though, what I am trying to accomplish actually is a question/answer chatbot for my website. I have around 4000 articles that I have scraped that I would like the AI to pull answers from. It is somewhat working but I am getting dizzy by all the different options and which is the right way to go.

What I am doing right now is creating a simple vector index over all the articles, saving it to disk, then just using that index to query with the user question. I haven't tested it with all 4000 articles yet though since I am afraid to mindlessly just index so much data without knowing the best way to go about it.

Do you have any suggestions or ideas how to go about it, is there a theory about how big of a chunk size should I set or which index to use etc?

Additionally one problem I encountered saving the index to disk is that the encoding seems wrong, is there a way to force it into utf-8? My index looks something like this:

@Logan M
Attachment
image.png
one way to force the file encoding is to use index.save_to_dict() and then use the json library to write it in the encoding of your choice

The index structure will largely depend on what your articles look like. Are they long and structured (i.e. headings and whatnot), is each one on a specific topic, is there an easy way to get a summary or defining details of each article, etc.

I would randomly sample about ~400 (10%) articles to build a test with. If you stick to things like vector indexes, then the cost to build the index should be very low (embeddings are very cheap). So yea, starting with a single vector index is a good place to start
Thank you so much for the answer!
@Erik were you able to get the gpt-3.5-turbo working with context of previous messages? like "role": "system", and "role": "user"?
No, I switched to just using davinci model on the llama-index part and using their langchain integration to let the langchain do the chat part with chatgpt

Like shown in here https://github.com/jerryjliu/gpt_index/blob/main/examples/langchain_demo/LangchainDemo.ipynb
@itsgeorgep Hey, just an update that I am using gpt-3.5-turbo right now. How I create a custom Q/A template:

from langchain.prompts.chat import (
AIMessagePromptTemplate,
SystemMessagePromptTemplate,
HumanMessagePromptTemplate,
ChatPromptTemplate,
)

QA_TURBO_TEMPLATE_MSG = [
SystemMessagePromptTemplate.from_template("System Message + {context_str}"),
HumanMessagePromptTemplate.from_template("Human Message Example"),

AIMessagePromptTemplate.from_template("AI Message Example"),

HumanMessagePromptTemplate.from_template("{query_str}")

]

QA_TURBO_TEMPLATE_LC = ChatPromptTemplate.from_messages(QA_TURBO_TEMPLATE_MSG)

QA_TURBO_TEMPLATE = QuestionAnswerPrompt.from_langchain_prompt(QA_TURBO_TEMPLATE_LC)

QA_PROMPT = QA_TURBO_TEMPLATE


index.query(q, llm_predictor=LLM_PREDICTOR, llama_logger=llama_logger, text_qa_template=QA_PROMPT, similarity_top_k=SIMILARITY_TOP_K, response_mode="compact")

Hopefully this helps
@Sahil @Rashika
Add a reply
Sign up and join the conversation on Discord