The community member is seeking guidance on using Retrieval Augmented Generation (RAG) with semi-structured data, which combines structured and unstructured data. They have a use case where they are taking unstructured data and creating a semi-structured representation with quantitative/categorical fields for filtering and descriptive, textual fields for RAG. The community member is looking for guides or resources, either internal or external to Llamaindex, that could help with this approach.
In the comments, another community member suggests that the query engine described in the SQLJoinQueryEngine example from the GPT-Index documentation may be relevant to the original poster's use case.
Hi @Logan M @jerryjliu0 wanted to ask you if there's any guides or useful references for RAG on semi-structured data. Or rather, a combination of structured + unstructured data. I have a use case in which we're taking a lot of unstructured data, and then creating a semi-structured representation of it: Some concrete quantitative/categorical fields that can be used for filtering in traditional database queries, and then some more descriptive, textual fields. The way we're envisioning the end use is: Based on user queries, doing some initial database filtering, then being able to do RAG on the descriptive, textual fields that meet those filtering criteria. Do you think there any guides or resources that could help with this, internal or external to Llamaindex? Thanks so much!