Chat

At a glance

The community member is trying to create a chatbot using LlamaIndex and Python Streamlit to work with multiple PDF documents. They are having trouble getting the chatbot to provide information about the titles of abstracts in the PDFs, as it keeps responding that the "abstract titles are not provided in the given context information".

The community members suggest that the chatbot may not have enough context or that the PDF parsing is not indexing the data correctly. They provide their code and ask if they are not providing enough data or if there is an issue with the PDF parsing.

The comments suggest that the community member should try setting a system prompt, as the "condense question" chat mode may be an issue. Another comment mentions that it can be difficult to grab titles using semantic search. A final comment asks what the community member would get if they used the "SummaryIndex" instead of the "VectorStoreIndex".

There is no explicitly marked answer in the comments.

Useful resources

aaindap

Hi I am new to LlamaIndex. I am trying to make a chatbot with Python streamlit (based on this https://blog.streamlit.io/build-a-chatbot-with-custom-data-sources-powered-by-llamaindex/)

That worked fine, but I wanted to make a an app where I could make a chat bot with several PDFs. : https://esmo2022-abstracts-gastro.streamlit.app/

Whenever I ask it certain general questions about titles of abstracts in my PDFs, like "Tell me some titles of abstracts" I get the response 'Abstract titles are not provided in the given context information. The phrase not provided in given context is a repeated pattern whenever I don't ask very specific questions about the abstracts in my PDF documents.

Am I not providing it enough data? Or is the PDF parsing not indexing the data correctly?

Here is my code:

def load_data():
    with st.spinner(text="Loading and indexing abstracts! This should take 1-2 minutes."):
        reader = SimpleDirectoryReader(input_dir="./ESMO_abstracts", recursive=True)
        docs = reader.load_data()
        service_context = ServiceContext.from_defaults(llm=OpenAI(model="gpt-3.5-turbo", temperature=0.5, system_prompt="You are an expert ESMO 24th World Congress on Gastrointestinal Cancer 2022 abstracts and your job is to answer technical questions. Assume that all questions are related to ESMO 24th World Congress on Gastrointestinal Cancer 2022 abstracts Keep your answers technical and based on facts – do not hallucinate features."))
        index = VectorStoreIndex.from_documents(docs, service_context=service_context)
        return index

index = load_data()

chat_engine = index.as_chat_engine(chat_mode="condense_question", verbose=True)

5 comments

LLogan M

The chat engine doesn't have enough context for the info it has access to, and the questions are too vague

try setting a system prompt

LLogan M

Oh wait, it's condense question

LLogan M

Even worse to fix 😅 I'll have to read some source code

TTeemu

Sounds like a limitation of semantic search, it's pretty hard to grab titles

MMarioZ

what you get with:
summary_index = SummaryIndex.from_documents(documents, service_context=service_context)
?

Add a reply

Find answers from the community

Chat