Find answers from the community

Home
Members
ejmiddle
e
ejmiddle
Offline, last seen 3 months ago
Joined September 25, 2024
Hey there, as far as I see the right way now to read pptx files is via importing

from llama_index.readers.file import PptxReader

However, when calling this I get

ImportError: Please install extra dependencies that are required for the PptxReader: pip install torch transformers python-pptx Pillow

This happens as images may be read via a tranfomers package within the reader. However, I dot not necessarily want to include the transformers package in a openai based rag app. The way to circument this before was via the llamahub pptx reader, but this does not work anymore after the update. Whats the current best practice to read pptx files without installing transformers package?
6 comments
s
e
L
W
Hi there, I am struggling with having full control about what exact components of the service_context are actually used in a RetrieverQueryEngine.
Components are passed via ServiceContext.from_defaults during
  • creation of the index
  • adding of nodes
  • and then creating the query engine itself.
But what exactly is used when is not obvious. Especially when passing a certain llm in the last step, it seems like it will possibly not use it.

So questions are:
  • Is there a way to have adequate logging output to be sure what exact llms/apis are used
  • What are the design principles behind the service context? Alway use the latest component passed somewhere? And jump to defaults when the component is invalid? Ors th like that. Maybe you have a link on that?
  • Is there a more appropriate way to have such control, cause maybe service context is rather for fast prototyping only?
Thank you so much
Andi
5 comments
L
e
W