Find answers from the community

Updated 8 months ago

is there anything in LlamaIndex where I could generate questions from multiple PDF's?

At a glance

A community member asked if LlamaIndex has a way to generate questions from multiple PDF documents. Another community member responded that yes, LlamaIndex provides a way to do this using the DatasetGenerator class, which uses a Language Learning Model (LLM) to generate questions from a set of documents. The response included an example code snippet demonstrating how to use the DatasetGenerator class to generate questions from train and test documents, which could be replaced with PDF documents. The community member also mentioned that the QUESTION_GEN_PROMPT variable can be used to instruct the LLM on how to generate the questions, and provided an example prompt. The community member noted that more details can be found in the LlamaIndex documentation.

@kapa.ai is there anything in LlamaIndex where I could generate questions from multiple PDF's?
k
2 comments
Please stand by @Anuj as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
Yes, LlamaIndex provides a way to generate questions from multiple documents, including PDFs, using the DatasetGenerator class. This class uses a Language Learning Model (LLM) to generate questions from a given set of documents.

Here is an example of how you can use it:

Plain Text
from llama_index.core.evaluation import DatasetGenerator
from llama_index.llms.openai import OpenAI

# Initialize the LLM
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.3)

# Define your documents (in this case, they are train_documents and test_documents)
# You would replace these with your PDF documents

# Instantiate DatasetGenerator's for train and test
train_dataset_generator = DatasetGenerator.from_documents(
    train_documents,
    question_gen_query=QUESTION_GEN_PROMPT,
    llm=llm,
    show_progress=True,
    num_questions_per_chunk=25,
)

test_dataset_generator = DatasetGenerator.from_documents(
    test_documents,
    question_gen_query=QUESTION_GEN_PROMPT,
    llm=llm,
    show_progress=True,
    num_questions_per_chunk=25,
)


In this example, QUESTION_GEN_PROMPT is a string that instructs the LLM on how to generate the questions. For instance:

Plain Text
QUESTION_GEN_PROMPT = (
    "You are a Teacher/ Professor. Your task is to setup "
    "a quiz/examination. Using the provided context, formulate "
    "a single question that captures an important fact from the "
    "context. Restrict the question to the context information provided."
)


You can find more details in the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord