Find answers from the community

Updated 5 months ago

Hi, I'm not sure to understand how QP

Hi, I'm not sure to understand how QP perf work.

Indeed,
First Program: LLM1 takes 1min
Second Program: LLM1 takes 3min.
Plain Text
    def get_query_pipeline(self):
        """Create & Return the Query Pipeline of database generation"""

        qp = QP(
            modules={
                "input": InputComponent(),
                "process_retriever": self.process_retriever_component,
                "table_creation_prompt": self.table_creation_prompt,
                "llm1": self.llm1,
                "python_output_parser": self.python_parser_component,
            },
            verbose=True,
        )

        qp.add_link("input", "process_retriever")
        qp.add_link("input", "table_creation_prompt", dest_key="query_str")
        qp.add_link(
            "process_retriever", "table_creation_prompt", dest_key="retrieved_nodes"
        )

        qp.add_chain(["table_creation_prompt", "llm1", "python_output_parser"])

        return qp

VS
Plain Text
    def get_query_pipeline(self):
        """Create & Return the Query Pipeline of database generation"""

        qp = QP(
            modules={
                "input": InputComponent(),
                "process_retriever": self.process_retriever_component,
                "table_creation_prompt": self.table_creation_prompt,
                "llm1": self.llm1,
                "python_output_parser": self.python_parser_component,
                "table_insert_prompt": self.table_insert_prompt,
                "llm2": self.llm1,
                "python_output_parser1": self.python_parser_component,
            },
            verbose=True,
        )

        qp.add_link("input", "process_retriever")
        qp.add_link("input", "table_creation_prompt", dest_key="query_str")
        qp.add_link(
            "process_retriever", "table_creation_prompt", dest_key="retrieved_nodes"
        )

        qp.add_chain(["table_creation_prompt", "llm1", "python_output_parser"])
        
        ...
        return qp
L
A
8 comments
How are you measuring runtime of llm1 ?
what llm are you using?
I'm using Ollama to use the LLM.

And in the ollama serve, you have access to logs where the runtime of each api call to the llm is display.

-> I'm using Llama3.
Ah, with ollama, it could be a few things
  • cache hits
  • model reloading due to inactivity
  • other processes using compute and slowing down llm calls
Thanks for your answer :D, I will try once again: both with and without adding more modules (several times).

I'm kind of interested how does it work behind the scene...

  • Cache Hits: Does the hit rate frequency (cache usage retrieval) is lower with more modules ?
  • Model reloading: But having more modules shouldn't affect much the performance of the first llm in this case, isn't it ?
  • Other processes: Is there like a pre-loading of each modules ?
Alright, I think, I figured out what is happening. It's probably just cache hits as you said where with more modules, the temp cache used for the first llm is cleared.

Thanks you so much x)
Great! πŸ’ͺ
Yea ollama does a lot of caching and optimizations
Add a reply
Sign up and join the conversation on Discord