Yeah I know that what I hav ebeen using, but I would like the integration to be through llama index. Specifically for function calling. No one seems to integrate this
Is there any kind of prompt caching in place? How can I intercept llm calls, to put a cache layer before? Is there any mechanism in-place to do this, instead of implementing by hand?
In query pipelines I want to have a LLMMultiSelector that selects tools, and then ask a question independently to each tool. How can I stream the multiple values of the output of LLMMultiSelector each to its own tool? And then summarize? I dont see a way to express this with pipelines
Hey guys, I have looked around documentation and found nothing regarding getting the index at which text was split using TokenTextSplitter for example.
For now it returns the usual metadata, page number and document name, but I would like to have at which index it was split.
I think langchain as an option for this, is there something similar here? Maybe a callback?