I have a question about the different index types, and which would be appropriate for my use case.
I have a dataset of 100 products. Each product is a food item and is made of ingredients. I am attempting to build an index that would be sufficient for a LangChain agent to answer questions about those items (i.e. which ones contain a particular ingredient, what an ingredient is, etc.)
Each product and ingredient has a description, and that's what I intend to use as the "page content".
Since there is a natural hierarchy to the product/item relationship (products
contain ingredients), I was considering using
GPTTreeIndex
. I'm attaching the code I used in this paste-bin (
https://pastebin.com/KSBH80QZ) as it's too long to include here, but this is the general framework of my code
for ingredient in ingredients:
# Build the doc and index for the current ingredient
doc = Document(text = ingredient_name + ingredient_description, doc_id = ingredient_name)
index = GPTSimpleVectorIndex([doc])
# Append the ingredient name, doc, and index to the ingredient_tuple list
ingredient_tuple.append(ingredient_tuple(ingredient, doc, index))
for product in products:
# Get the unique ingredients for the current product
ingredients = get_product_ingredients(product)
# Create a GPTTreeIndex for the product using the ingredient indices
product_index = GPTTreeIndex([x.index for x in ingredient_tuple if x.ingredient_name in ingredients])
# Append the product name and index to the product_tuple list
product_tuple.append(product_tuple(product, product_index))
# Create a GPTTreeIndex for the portfolio using the product indices
portfolio_index = GPTTreeIndex([x.index for x in product_tuple])
I'm not getting the results I expect after running
portfolio_index.query
, and I don't know
why I'm using GPTTreeIndex here.
Does anyone have advice or guidance on when to use these? I've gone through the docs already. TY