----------------------
Yes, LlamaIndex provides a way to generate questions from multiple documents, including PDFs, using the
DatasetGenerator
class. This class uses a Language Learning Model (LLM) to generate questions from a given set of documents.
Here is an example of how you can use it:
from llama_index.core.evaluation import DatasetGenerator
from llama_index.llms.openai import OpenAI
# Initialize the LLM
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.3)
# Define your documents (in this case, they are train_documents and test_documents)
# You would replace these with your PDF documents
# Instantiate DatasetGenerator's for train and test
train_dataset_generator = DatasetGenerator.from_documents(
train_documents,
question_gen_query=QUESTION_GEN_PROMPT,
llm=llm,
show_progress=True,
num_questions_per_chunk=25,
)
test_dataset_generator = DatasetGenerator.from_documents(
test_documents,
question_gen_query=QUESTION_GEN_PROMPT,
llm=llm,
show_progress=True,
num_questions_per_chunk=25,
)
In this example,
QUESTION_GEN_PROMPT
is a string that instructs the LLM on how to generate the questions. For instance:
QUESTION_GEN_PROMPT = (
"You are a Teacher/ Professor. Your task is to setup "
"a quiz/examination. Using the provided context, formulate "
"a single question that captures an important fact from the "
"context. Restrict the question to the context information provided."
)
You can find more details in the
LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------