Find answers from the community

Updated last year

Agents

At a glance

The community member is watching a video on the Llama Index and has questions about the Data Agent. They are unsure where the Vector Store is specified, and wonder if the data is being saved as a Vector Index in memory rather than using a Vector DB. The community member also questions whether the practice of embedding and dumping Wikipedia data into the Vector Store is normal, as it seems slow and heavy.

The comments indicate that the Load and Search tool creates a vector index in memory with the retrieved data, which happens automatically. Embeddings are described as fast and cheap, and a good mechanism when a tool loads a lot of data like an entire Wikipedia page. The comments also suggest that you can't feed an entire Wikipedia page as input to a Language Model, so the vector search narrows it down.

In response to the community member's additional questions, a comment suggests that there is an index_kwargs dict that can be used to input a service context with the desired embeddings. Regarding maintaining memory in a web API setting, the comment notes that the memory would be cleared once the tool is destroyed or garbage collected, and this may happen either on every API call or between calls, depending on the API setup.

Useful resources
I'm watching the Discover Llama Index series on YouTube and I have some questions regarding the Data Agent
https://youtu.be/GkIEEdIErm8?feature=shared&t=269

At 4:29, the presenter says Wikepedia data is queried and then dumped into a vector store, and then the 2nd tool queries the vector store.

  1. From the screenshot, I cannot tell where he specified the Vector Store. I would imagine there's like a Pinecone or something specified somewhere. Is this just saving as a Vector Index in memory (not using Vector DB) but it allows follow up questions.
  1. In each API call, you must embed what you search in wikipedia, and then dump it into vector store. Is this kind of practice normal because it sounds really slow/heavy but something I see a usecase for. For example when I want to narrow down a diary to a specific date range with SQL and then do some refined vector base search within it.
Attachment
Screenshot_2023-09-16_at_18.46.20.png
L
c
6 comments
  1. Yea the load and search tool creates a vector index in memory with the retrieved data. This happens automatically under the hood
  1. Embeddings are pretty fast and cheap. It's a good mechanism when a tool loads a lot of data (like an enrire Wikipedia page)
You can't feed the entre Wikipedia page as input to an LLM, so the vector search narrows it down
Thank you!
  1. Is it possible to pass in my own Embedding LLM because I use Azure Open AI for Embeddings. Would that be passed in the Tool (like wiki) first instead of the LoadAndSearchToolSpec func.
  2. Regarding 1. how would you maintain this memory in a web api setting. Would it make sense to clear the vector index after periodically to prevent the server from running out of memory as users make more and more different kinds of search.
Yea there's an index_kwargs dict that you can input. I would throw in a service context that has the embeddings you want to use

Plain Text
LoadAndSearchToolSpec.from_defaults(..., index_kwargs={"service_context": ctx})


In an api setting, not 100% sure. The memory would be cleared once the tool is destroyed/garbage collected. Depending on your api setup this might be every api call, or you keep the tool in memory between api calls πŸ€”

Source code:
https://github.com/jerryjliu/llama_index/blob/main/llama_index/tools/tool_spec/load_and_search/base.py
Great thanks!
Add a reply
Sign up and join the conversation on Discord