Find answers from the community

Updated 2 years ago

Logan M i am looking at

At a glance
@Logan M i am looking at:
https://gpt-index.readthedocs.io/en/stable/examples/node_postprocessor/MetadataReplacementDemo.html#metadata-replacement-node-sentence-window

I see that you are passing the node_parser into the ServiceContext:
Plain Text
ctx = ServiceContext.from_defaults(
    llm=llm,
    embed_model=HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-mpnet-base-v2"
    ),
    node_parser=node_parser,
)


Is this node_parser what's taking the documents and then "breaking them up" as opposed to calculating the embeddings for the entire thing and shoving the whole document into the index?
L
t
2 comments
Right, the node_parser breaks documents into nodes, according to the chunk size. This gets used anytime you call from_documents(documents, ...) or insert(document)

The default chunk size is 1024 tokens.
Add a reply
Sign up and join the conversation on Discord