is there a way to speed up the process

ggiohax

is there a way to speed up the process or optimize? I did a listindex query to get a summarization but it took like 60 seconds even using gpt-3.5-turbo

6 comments

LLogan M

List indexes are naturally a little slow, since they have to read all the data in the index.

You can try doing index.as_query_engine(response_mode='tree_summarize', use_async=True), which will hopefully be a little faster

ggiohax

Hey @Logan M So Im struggling trying to setup the proper way to structure my data and also choosing the right index. Im trying to build Q&A/Chatbot for an ecommerce that sells supplements as products. Each product/supplement has a description/list of purposes and has a list of ingredients. So users on site can ask "I need something for gaining mass" and it will find the product that has a purpose of "gaining mass". And it can also list out the ingredients in the product. So users can also ask something like "Which products has this ingredient?"

So at the moment, I created a JSON file that contains all theproducts, its ingredients, and its purposes.
{
"product_name": "Product 1,
"ingredients": ["ingredient1", "ingredient2"]
"purpose": ["weight gain", "gaining mass", "gym", "workout"]
}

For my code, I created a Document for each product and add text=purpose in the parameter, and added the product_name and ingredients as its metadata.
I used a ListIndex for storing these documents.

However, when I query "Give me a product for acne or clearer skin", its still struggling to find that product even though its listed in the "purpose" of that product.

Maybe I am using the wrong index for this ? and also should i not use this nested json? Should I create a separate pdf for each product , and put all the ingredients and description there?

ggiohax

Maybe I should do a different index for this? Or change the way i structure my data before being loaded? What do you think

LLogan M

Hmm, yea I would not use a list index for this. A list index will read every piece of data in the index, which is maybe not what you wanted.

Here's how I would re-write this approach

Plain Text

from llama_index import VectorStoreIndex, ServiceContext, Document
from llama_index.llms import OpenAI

# define LLM
llm = OpenAI(temperature=0.0, model="gpt-3.5-turbo", max_tokens=500)
service_context = ServiceContext.from_defaults(llm=llm, chunk_size=1024)


# Open the JSON file
with open('./data/blends.json', 'r', encoding='utf-8') as f:
    data = json.load(f)

documents = []

# Iterate over products
for product in data['products']:
    product_name = product['product_name']
    ingredients = ', '.join(product['ingredients'])
    purpose = ', '.join(product['purpose'])
    
    document_text = (
      f"Product Name: {product_name}\n"
      f"Ingredients: {ingredients}\n"
      f"Purpose: {purpose}\n\n"
    )
    
    document = Document(
        text=document_text
    )
    
    documents.append(document)
    
index = VectorStoreIndex.from_documents(documents, service_context=service_context)

query_engine = index.as_query_engine(
    similarity_top_k=10. # since the documents are quite short, we can increase this from the default of 2
)
response = query_engine.query("I need something for anti inflammation")
print(response)

LLogan M

I think that might work well? 🤞

ggiohax

will try and test it out, thanks!

Add a reply

Find answers from the community

is there a way to speed up the process