LlamaIndex Community

Instructing an LLM to send tool calls instead of messages

I am having a problem of the llm returning "I am handing off to AgentXYZ" instead of acutally sending "content" as empty and tool calls.. Is there a way to instruct llm to send the tool call instead of the message? I am using multi agent work flow

1 comment

mmxtt v4

Crypto

u guys arent planning on making a coin on solana?

2 comments

IIrian

the llama cloud is out (or llamaparse at least)

Uh, it's not me right? LlamaCloud is out (or LlamaParse at least)

1 comment

ggalvangjx

Querying a Multimodal LLM Continuously While Retaining Memory

hello there, i am looking for a way to query a multimodal llm continously while retaining memory. much like simulating an user giving chatgpt an image and the first prompt and subsequently follow up prompts asking question about the image. is there a cookbook for this in llamaindex?

4 comments

AAriel

Checkpoint

I'm currently developing an agent workflow with human in the loop interaction and function calling. The workflow works great if the user stays in the session to complete it. I've tried both context serialization and checkpoints in order to persist context state with no success. I save the context/checkpoint after each iteration and load it back when starting the workflow as suggested in the documentation. I think the problem is with tool calling. Right after loading the checkpoint and adding the new user input, the agent gets stuck "thinking" ... as if it didn't know what steps is next.

4 comments

CCaptainMario

Very Large Language Model Structured Outputs

vLLM Structured Outputs.

Hi, I'm trying to do the same thing as this person (Issue #17677 on GitHub) but running into errors. If I do sllm.complete(prompt) or sllm.chat(ChatMessage[]), I get 'tool_choice must either be a named tool, "auto" or "none".'.

If I put tool_choice = auto or none, I get "Expected at least one tool call, but got 0 tool calls."

I copied the code in the documentation as well as the one recommended on GitHub issues. What could be the problem?

Also tried it with is_function_calling_model=True and False.

5 comments

bbszaniecki

Retrieving Chunks as ChatResponse Objects from an Agent

Hello there!
I'm experiencing an issue - i want to retrieve chunks in form of ChatResponse objects from an agent.
i did a following:

Plain Text

response_generator = self.agent.stream_chat(message=messages[-1].content, chat_history=messages[:-1]).chat_stream
                for token in response_generator:
                    yield token

but i'm getting:

Plain Text

ValueError: generator already executing

when using response_gen instead of chat_stream it's working flawless. However, i truly need that ChatResponse objects

9 comments

bbreakinglousy

Importing the QdrantVectorStore from llama_index

I tried importing the QuadrantVectorStore like this: from llama_index.vector_stores.qdrant import QdrantVectorStore

but it says it can't be resolved? It's like that in the docs

4 comments

PPhilipp

Optimizing the speed of VectorStoreIndex.from_documents() for large JSON files

Hi, how can i increase speed of VectorStoreIndex.from_documents()? I have single JSON ~140mb and it spend 50min on generating index on google colab, before i interrupted it. Now i try to run locally and already 10min of executing. What is the common time it take to generate index?

65 comments

JJiOne

Troubleshooting Handoff to Multi-agent Workflow

hi guys, my handoff_to is not working for multi agent, its not able to respond. so the root agent works but its handed off agent but it never get respose from them. any suggestion or help how to fix it? No errors but it never worked. https://docs.llamaindex.ai/en/stable/examples/agent/agent_workflow_multi/

3 comments

TToshi

Frequently Using the LlamaParse() Method

@Icksir , I frequently use the LlamaParse() method. So, when you receive a response from that method, it contains result (which has the text response), input_tokens, and output_tokens, which indicate how many tokens were used from the job ID. I hope this helps.

ggamecode8

Tools

Hello, Im trying out the AgentWorkflow feature and have noticed that the tool outputs arent being captured.

To start I've made a simple single agent AgentWorkflow. While the responses are generated, the response.tool_calls list is empty, and when listening to the stream of events, I never see the ToolCallResult being output.

My goal is to be able to get the source nodes used by the query engine tool. Not sure if its an issue or I have misunderstood something. Im following https://docs.llamaindex.ai/en/stable/understanding/agent/multi_agents/

See basic example below.

topic_a_agent = FunctionAgent(
name="topic_a_expert",
description="Answers questions about topic A",
system_prompt="You are a retrieval assistant.",
tools=[QueryEngineTool((....)]
llm=OpenAI(model="gpt-4"),
)

workflow = AgentWorkflow(
agents=[topic_a_agent], root_agent="topic_a_expert"
)

response = await workflow.run(user_msg="......")

7 comments

OOUTYUA

Changelog

where can find the release notes

1 comment

WWilli

Agent

Hey LlamaIndex community! 👋 I'm an experienced dev looking to build a project to learn the framework in-depth.

I want to create an agent that helps create and refine business offers using natural language. Here's what I want it to do:

Take an initial offer description and repeatedly prompt for missing info, if any (customer name, products, quantities)
Validate inputs (e.g., no negative quantities, verify products/customers exist)
Generate the offer in LaTeX format
Allow questions about the offer
Allow refinements (change or swap products, ...)
Support offer finalization as an end state

I have some architecture/design questions I'd love input on:

What's the best way to handle conditional LaTeX output vs Q&A responses? (Evaluation?)
For offer creation/updates - should these be handled via response generation or function calls?
How to properly manage state transitions (no offer → draft → finalized)?
Should validation be its own LLM workflow step?
Best practice for product/customer data - prompt injection, vector store, or filtered function calls?

Any guidance would be much appreciated! 🙏

1 comment

nnsoybean

Exploring the Document Summary Index in LlamaIndex

Hi! i am using llamaIndex python, mongoAtlas db as my persistant storage. I had an initial successful implementation of using MongoDBAtlasVectorSearch as my vectorStore, and rag is working 🙂

but now i am exploring the document summary index, and im struggling to understand the concept of docstore and index store.

am i able to create a document summary index from my existing vector store?
does anyone have a copy of data in their docStore, vectorStore, indexStore? I will like to see the data to see how they all related to each other.

Any help pointing me to the right direction is greatly appreciated! T.T

8 comments

TToshi

Reusing Job Id with Different Parsing Instructions

Hello @Logan M.
I hope you're doing well.

I have a question about using the parsing_instruction parameter in the LlamaParse method. Is it possible to reuse the same job_id to perform multiple parsing iterations by sending different parsing_instruction values? Or does each modification require creating a new job?

If this functionality is not currently supported, is there an alternative approach you would recommend to achieve similar behavior?

I appreciate your time and any insights you can provide.

Thanks in advance! Cheers! 😁

1 comment

ppablonm

llama_index/llama-index-core/llama_index...

Hi there! I observed that ChatSummaryMemoryBuffer crashes for Anthropic due to sending a single message with role system and no message with role user. when reading the code here: https://github.com/run-llama/llama_index/blob/90761a9f789bb7628d4faf40ae900d93f16065b7/llama-index-core/llama_index/core/memory/chat_summary_memory_buffer.py#L272 I'm seeding that it's sending the context as role system but doesn't send the system prompt instruction to summarize the context. In the image attached I fixed it and it works perfect for me. Is the current implementation bugged?

3 comments

OOverclockedClock

Troubleshooting Chroma Vector Store and Docstore Setup

Hey, I'm running a chroma vector store and a very basic storagecontext and docstore setup. For some reason whenever I try to peek into my chromadb after indexing a handful of sample documents, it returns an empty dict as follows:

{'ids': [], 'embeddings': array([], dtype=float64), 'documents': [], 'uris': None, 'data': None, 'metadatas': [], 'included': [<IncludeEnum.embeddings: 'embeddings'>, <IncludeEnum.documents: 'documents'>, <IncludeEnum.metadatas: 'metadatas'>]}

. When I print my docstore from index.docstore.docs, it states that I do have documents. I've been debugging it for a bit, playing around with persist paths and other configs but I can't seem to find where the problem resides

21 comments

IIcksir

Addressing Memory Issues in a Chatbot Workflow

Hey! I am curring facing issues with the use of memory inside a workflow, and I don't know where else to ask. I am creating a chatbot to chat with multiple documents, and my workflow is now looking like the image with this message. The "ingest" path just creates the top agent to retrieve the documents, and the "ask" path is meant to consult the LLM with the indexes.

My ask step looks like this, but the chat stores just overwrites itself after the top agent call. It doesn't remember the chat history, and I don't know if I am doing something wrong or simple I shouldn't use SimpleChatStore (I just wanted to do a proof of concept).

Any advice is welcomed

Plain Text

@step
async def ask(self, ev: StartEvent) -> StopEvent | None:

    obj_index = ev.get("obj_index")
    query = ev.get("query")
    chat_store = ev.get("chat_store")
    user = ev.get("user")
    if not obj_index or not query:
        return None
    
    user_file = f"./conversations/{user}.json"

    if not os.path.exists(user_file):
        chat_store = SimpleChatStore()
    else:
        chat_store = SimpleChatStore.from_persist_path(persist_path=user_file)
    
    chat_memory = ChatMemoryBuffer.from_defaults(
        token_limit=3000,
        chat_store=chat_store,
        chat_store_key=user,
    )

    top_agent = OpenAIAgent.from_tools(
        tool_retriever=obj_index.as_retriever(similarity_top_k=3),
        system_prompt=PROMPT,
        memory=chat_memory,
        verbose=True,
    )
    
    response = top_agent.query(query)
    chat_store.persist(persist_path=user_file)

    return StopEvent(result={"response": response, "source_nodes": response.source_nodes})

13 comments

ddaneeee

Using metadata filters to filter by dates

Is it possible to use Metadata filters to filter by dates? Looking at the schema there doesn't seem to be a way to easily match years in a timestamp

6 comments

ffunkyoz

Improving Retrieval in a RAG Pipeline with Image Extraction from PDF Files

Hello everyone! I’m working on improving my RAG pipeline by extracting images from my PDF files. While I haven’t encountered significant challenges in the ingestion and indexing phases, I’m a bit uncertain when it comes to retrieval.

Currently, retrieval is handled through tool calls, allowing the model to determine when additional information is needed to answer a user’s query. I’m using GPT-4o via OpenAI’s API, but since the output can only contain text and not images, I’m facing a limitation. My goal is to pass images—if present in the retrieved chunks—to enhance the quality of responses.

What would be the best way to overcome OpenAI’s API constraints? Has anyone else faced a similar issue? If so, how did you resolve it?

I've also attached an example of an API call I attempted, but it didn’t work as expected.

6 comments

AAlessandro Giagnorio

Parallel querying on the same vector store in llama-index

Hello!
I have a very quick question about retriever querying engines, since I didn't find anything in this regard in the documentation.
Simply, I have a list of files and a multiple questions for which I want an answer, considering these files as context. I know that I can use as_query_engine after indexing my content. However, I can query only one question at time. Do you know if there is any inner library support to parallel querying on the same vector store? The alternative would be to parallelize using Python processes, but it would be nice if something similar is already implemented in llama-index

1 comment

ppurple_lantern

Ingesting Duplicate Document Chunks with AzStorageBlobReader

Hi I am using AzStorageBlobReader, and while reinitatiing my RAG pipeline, it is ingesting duplicate document chunks, I think its because its putting it in temporary directory because the doc_hash is not changing whereas the doc_id seems to change Any suggestions ?

10 comments

KKunjan

Creating An Index With An Openai-like Llm

Hello People,
I need your guidance.

Plain Text

from llama_index.llms.openai_like import OpenAILike

llm = OpenAILike(
  model="model",
  api_key="Key",
  api_base="OpenAI Compatible endpoint",
  context_window=16000,
  is_chat_model=True,
  is_function_calling_model=False,
)
Settings.embed_model = llm

# Create index
index = VectorStoreIndex.from_documents(
    documents, 
    show_progress=True)

In this code facing error. Am I doing something wrong?

Plain Text

1AssertionError                            Traceback (most recent call last)
Cell In[22], line 31
     22 documents = SimpleDirectoryReader("../data", required_exts=[".txt"]).load_data()
     23 #embed_model = llm
     24 
     25 
   (...)
     29 #     api_base="http://tentris-ml.cs.upb.de:8502/v1"
     30 # )
---> 31 Settings.embed_model = llm
     33 # Create index
     34 index = VectorStoreIndex.from_documents(
     35     documents, 
     36     show_progress=True)

File c:\Users\KUNJAN SHAH\AppData\Local\Programs\Python\Python311\Lib\site-packages\llama_index\core\settings.py:74, in _Settings.embed_model(self, embed_model)
     71 @embed_model.setter
     72 def embed_model(self, embed_model: EmbedType) -> None:
     73     """Set the embedding model."""
---> 74     self._embed_model = resolve_embed_model(embed_model)

File c:\Users\KUNJAN SHAH\AppData\Local\Programs\Python\Python311\Lib\site-packages\llama_index\core\embeddings\utils.py:136, in resolve_embed_model(embed_model, callback_manager)
    133     print("Embeddings have been explicitly disabled. Using MockEmbedding.")
    134     embed_model = MockEmbedding(embed_dim=1)
--> 136 assert isinstance(embed_model, BaseEmbedding)
    138 embed_model.callback_manager = callback_manager or Settings.callback_manager
    140 return embed_model

I need your little time. Please help

30 comments

ccmosguy

O3

@Logan M how did I use o1 or o3 as an agent in the workflow agent system? Do you guys have an example?

8 comments

Find answers from the community

Instructing an LLM to send tool calls instead of messages

Crypto

the llama cloud is out (or llamaparse at least)

Querying a Multimodal LLM Continuously While Retaining Memory

Checkpoint

Very Large Language Model Structured Outputs

Retrieving Chunks as ChatResponse Objects from an Agent

Importing the QdrantVectorStore from llama_index

Optimizing the speed of VectorStoreIndex.from_documents() for large JSON files

Troubleshooting Handoff to Multi-agent Workflow

Frequently Using the LlamaParse() Method

Tools

Changelog

Agent

Exploring the Document Summary Index in LlamaIndex

Reusing Job Id with Different Parsing Instructions

llama_index/llama-index-core/llama_index...

Troubleshooting Chroma Vector Store and Docstore Setup

Addressing Memory Issues in a Chatbot Workflow

Using metadata filters to filter by dates

Improving Retrieval in a RAG Pipeline with Image Extraction from PDF Files

Parallel querying on the same vector store in llama-index

Ingesting Duplicate Document Chunks with AzStorageBlobReader

Creating An Index With An Openai-like Llm

O3