Hello, is there a way I can have my RAG

This might be helpful
https://llamahub.ai/l/llama_packs-fuzzy_citation?from=llama_packs

Thank you I'll check this out

The notebook has a good example of what it does

Basically using fuzzy matching, it will attach metadata to the response to point towards what pieces of text led to the result

Got it

https://www.youtube.com/watch?v=xT6JpDELKPg In this tutorial, the RAG system is creating summaries for tables - does the retriever only look at these summaries when retrieving, or the raw table values too? I'm trying to extract "Land" from an Assets table, but I'm not sure whether the RAG would know that Land is in the Assets table, since the summary for the Assets table that Unstructured comes up with is vague: "Summary of Assets, Liabilities, and Stockholder's Equity"

Depending on the implementation, its using the summaries so that eithr LLM or embeddings can decide if that "path" should be used further for retrieval

Interesting

Is there a way to augment the summaries that are generated for tables?

To make them more specific and related to teh table content

definitely -- I think you basically have full control over it. For example, generating the summaries is just an LLM prompt, and you can change that prompt

or you can setup your own method to generate the table nodes 🤔 But that gets a little more low-level

Wanted to chime in here cause i've been trying to solve a similar problem. Whats helped so far:

Few Shot Prompting and distinguishing a good summary example from a bad one.
Playing around with the chunk sizes. Smaller is usually better up to a certain limit

Does LlamaIndex provide a way to pass in a prompt where I can give more specific instructions on how to generate the summaries like you both mentioned? Also, in the code in the tutorial I didn't see a line where we set chunking sizes, I thought that was automatic within the UnstructuredElementNodeParser

I guess I've just been having trouble figuring out how to write my own custom code and interface it with the functions LlamaIndex provides

Yes, LlamaIndex allows you to customize prompts, including providing specific instructions for generating summaries.
You can use the update_prompts function to customize the prompts.

https://github.com/run-llama/llama_index/blob/main/docs/examples/prompts/prompt_mixin.ipynb

for custom prompting https://github.com/run-llama/llama_index/blob/main/docs/module_guides/models/prompts/usage_pattern.md

The default chunk_size is 1024, but you can follow the instructions in the documentation to customize it. https://github.com/run-llama/llama_index/blob/main/docs/community/faq/documents_and_nodes.md

Got it this is very helpful, thank you both

If I have questions later on about this topic, should I ask it in this thread? I'm new to Discord so I wasn't sure if it notifies people if comments are created within threads

It seems like UnstructuredElementNodeParser doesn't have a get_prompts() method

rip

you can pass it into the constructor, like this (here's the default)

Plain Text

DEFAULT_SUMMARY_QUERY_STR = """\
What is this table about? Give a very concise summary (imagine you are adding a caption), \
and also output whether or not the table should be kept.\
"""

node_parser = UnstructuredElementNodeParser(summary_query_str=DEFAULT_SUMMARY_QUERY_STR)