1.It hashes the combination of inputs + transformation at each step.
So adding a document would be a cache miss. If you added a transformation to the end of the pipeline instead, it would only run the new transform
- It's still inserting nodes each time. If you are planning to rerun the same data and dedup, check out the page I linked that introduces the docstore to the pipeline