Find answers from the community

Updated 2 years ago

when I add a document to the index, does that mean that querying it is going to be more ex

@kapa.ai when I add a document to the index, does that mean that querying it is going to be more expensive as I add more docs to it?
Plain Text
index.insert(document)
k
L
R
9 comments
Please stand by @Rouzbeh as I search the knowledge base. We are experiencing especially high traffic right now and might take longer than usual. :wave:
----------------------
Yes, the cost of querying an index can increase as you add more documents to it, depending on the index structure you are using. Here's a brief overview of how the cost can be affected for different index structures:

  • GPTListIndex: By default, querying a list index requires N LLM calls, where N is the number of nodes. As you add more documents, the number of nodes increases, and so does the cost of querying.
  • GPTTreeIndex: By default, querying a tree index requires log(N) LLM calls, where N is the number of leaf nodes. As you add more documents, the number of leaf nodes increases, but the cost of querying grows logarithmically, which is slower than linear growth.
You can find more information about the cost tradeoffs between index building and querying in the Cost Analysis (https://gpt-index.readthedocs.io/en/latest/how_to/analysis/cost_analysis.html) section of the documentation.

When you insert a document using index.insert(document), the cost of querying the index may increase depending on the index structure you are using.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
With a vector index, the query cost will only increase if you increase the top k πŸ™‚
@Logan M Thanks. Assume I am dealing with a case like below.
I have an index that looks like below
Plain Text
index = GPTVectorStoreIndex.from_documents(
            document, service_context=service_context
        )

I can run query on this index, let's call it query1:
Plain Text
query_engine = index.as_query_engine()
query1 = "Question to be answered"
results = query_engine.query(query)
print(results)


Now, let's assume I add another document via
Plain Text
index = load_index_from_storage(
            storage_context, service_context=service_context
        )

index.insert(document[0])
index.storage_context.persist()


And run another query, let's call is query 2:
Plain Text
query2 = "2nd Question to be answered"
results = query_engine.query(query)
print(results)


Is the second question more expensive? I didn't change the top_k and it is whatever default value it has.
Nope! Query should be aroumd the same cost. πŸ‘Œ
Does adding to the index induce some cost?
Just the cost of embedding the document you inserted
(Thankfully embeddings are very cheap)
got it - thank you @Logan M
Add a reply
Sign up and join the conversation on Discord