Find answers from the community

Updated 6 months ago

When printing the trace when using query

At a glance

The community member is asking about the "chunking" process they see when using a query engine, and whether it uses prompt tokens. The first comment explains that chunking is just compacting all the nodes to minimize LLM calls, and it does not use tokens. The second comment simply says "Great", indicating the explanation was helpful.

When printing the trace when using query engine I always see,
SYNTHESIZE
CHUNKING
CHUNKING
LLM

Chunking has this info
Plain Text
{
  "__computed__": {
    "latency_ms": 1.436,
    "error_count": 0,
    "cumulative_token_count": {
      "total": 0,
      "prompt": 0,
      "completion": 0
    },
    "cumulative_error_count": 0
  }
}


What is this chunking actually doing? Does it use prompt tokens ?
L
c
2 comments
its just compacting all the nodes to minizmize LLM calls, its not using tokens
Add a reply
Sign up and join the conversation on Discord