Find answers from the community

Updated 6 months ago

is there anything in LlamaIndex where I could generate questions from multiple PDF's?

@kapa.ai is there anything in LlamaIndex where I could generate questions from multiple PDF's?
k
2 comments
Please stand by @Anuj as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
Yes, LlamaIndex provides a way to generate questions from multiple documents, including PDFs, using the DatasetGenerator class. This class uses a Language Learning Model (LLM) to generate questions from a given set of documents.

Here is an example of how you can use it:

Plain Text
from llama_index.core.evaluation import DatasetGenerator
from llama_index.llms.openai import OpenAI

# Initialize the LLM
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.3)

# Define your documents (in this case, they are train_documents and test_documents)
# You would replace these with your PDF documents

# Instantiate DatasetGenerator's for train and test
train_dataset_generator = DatasetGenerator.from_documents(
    train_documents,
    question_gen_query=QUESTION_GEN_PROMPT,
    llm=llm,
    show_progress=True,
    num_questions_per_chunk=25,
)

test_dataset_generator = DatasetGenerator.from_documents(
    test_documents,
    question_gen_query=QUESTION_GEN_PROMPT,
    llm=llm,
    show_progress=True,
    num_questions_per_chunk=25,
)


In this example, QUESTION_GEN_PROMPT is a string that instructs the LLM on how to generate the questions. For instance:

Plain Text
QUESTION_GEN_PROMPT = (
    "You are a Teacher/ Professor. Your task is to setup "
    "a quiz/examination. Using the provided context, formulate "
    "a single question that captures an important fact from the "
    "context. Restrict the question to the context information provided."
)


You can find more details in the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord