timothybeamish

How are folks sending Python objects to

How are folks sending Python objects to an LLM as context for summarization?

Are you transforming them to JSON first (and then to a string)? Do you have to augment your prompt to explain to the LLM what each field means/does? Do some LLMs happily accept serialized objects? Does pydantic help? Is there a library/service that can do this for you?

1 comment

ttimothybeamish

We believe to have found a circular

We believe to have found a circular dependency between the packages: llama-index-agent-openai and llama-index-llms-openai

llama-index-agent-openai package depends on llama-index-llms-openai = "^0.2.0" as seen here

llama-index-llms-openai package depends on llama-index-agent-openai = "^0.3.1" as seen here

Our Bazel Dependency Graph detected this when attempting to update LlamaIndex packages to their latest versions.

10 comments

ttimothybeamish

I see QueryPipelines are being phased

I see QueryPipelines are being phased out (in favor of Workflows, I assume). Workflows are nice but I also know the DPSy folks have created a LlamaIndexModule to wrap a QueryPipeline, for prompt optimization. Are there any plans between LlamaIndex and DSPy to optimize prompts within Workflows, given this appears to be the new direction?

7 comments

ttimothybeamish

When we instrument LlamaIndex for

When we instrument LlamaIndex for observability, we start to see the output of @dispatcher.span decorators that are already in the framework code. Most of the time, this is great. Is it possible to filter which of these we actually want in our final OpenTel logs?

In our specific case, the BaseRetriever class has a @dispatcher.span decorator. Meanwhile, we have one on each of our child retrievers. The result is an OpenTel trace that shows:
-- BaseRetriever
---- CustomRetriever

where we don't really need the top BaseRetriever log so I'm looking for a way to filter it out.

2 comments

ttimothybeamish

I'm looking for an example of

I'm looking for an example of Instrumentation with a custom Span. I'd like to create a custom span instance with a few properties and then enter that span at a specific point in our RAG pipeline, where I can assign values to those properties, and then exit it while updating some properties. I've read https://docs.llamaindex.ai/en/stable/module_guides/observability/instrumentation and looked at the linked notebooks, but it's unclear to me how we're meant to have our specific custom span used at a specific point in time. For example, when calling dispatcher.span_enter(...), I do not understand what values I'm meant to provide for id_ or bounded_args and I'm only guessing that instance is meant to be the instance of the span I want to use. Futhermore, I can't clearly see how my custom span is the one used when I use the @dispatcher.span decorator. I have placed breakpoints inside the new_span, prepare_to_exit_span, and prepare_to_drop_span functions inside my custom span but it doesn't look like they are called. An example would be helpful. Thanks.

12 comments

ttimothybeamish

Pydantic question!

Pydantic question!

Llama Index uses a try/except strategy when importing pydantic.v1 vs pydantic as seen in
https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/bridge/pydantic.py

This makes pyright's type checking unhappy. I presume it's unable to statically determine which pydantic is used (because the import is essentially dynamic) and complains, forcing us to add the dreaded # type:ignore comment to anything that touches this pydantic bridge.

Has anyone found an eloquent solution for this?

3 comments

ttimothybeamish

It's unfortunate that we have to `time.

It's unfortunate that we have to time.sleep(0.01) in the StreamingAgentChatResponse.
In JavaScript, there's a trick to set a timeout of 0ms for upcoming work and it achieves the same goal of allowing other workers to process things ahead of our work, but allows our work to proceed as quickly as possible. Can we do that in Python too? What happens if we time.sleep(0) instead?

27 comments

ttimothybeamish

Is anyone using real-time evaluation (of

Is anyone using real-time evaluation (of the retrieved nodes and/or response) to dictate the business logic for the rest of the RAG pipeline? For example, we could evaluate the retrieved nodes against the user query and decide the nodes are of poor quality so we tell the user "We can't answer your question" and the RAG pipe ends there (saving us the cost of synthesis from an LLM). Or we could evaluate the response and decide it wasn't good enough so we ask an Agent to break apart the initial query into sub-queries and try RAG again on the sub-queries (being careful not to go into endless recursion).

Is RAG evaluation more meant to be a method of observability, QA, and monitoring or can it actually influence business logic?

7 comments

ttimothybeamish

I'm looking for a callback handler that

I'm looking for a callback handler that will stream back Llama Index events in real-time. And then I'd like to combine that stream with the streamed response you typically get from a chat engine. The result would enable a consumer of Llama Index to iterate over a generator/queue that either produced a Llama Index event (a CBEventType) or the delta of a streamed chat response.

I'm willing to build this callback handler but want to check if anyone has seen one. I would think it's a fairly common feature to want to stream back both the CBEventType events and the generated response in one iterator.

77 comments

Find answers from the community

How are folks sending Python objects to

We believe to have found a circular

I see QueryPipelines are being phased

When we instrument LlamaIndex for

I'm looking for an example of

Pydantic question!

It's unfortunate that we have to `time.

Is anyone using real-time evaluation (of

I'm looking for a callback handler that