Find answers from the community

Updated 2 months ago

How do I see what chunks of text were

How do I see what chunks of text were created and the associate vectors?
j
G
L
5 comments
To see the nodes, you can explicitly parse Documents into Node objects before building an index: https://gpt-index.readthedocs.io/en/latest/guides/primer/usage_pattern.html#parse-the-documents-into-nodes. Gives you control over each node.

The embedding storage is a bit different since it depends whether you're using a vector store, but I'd take a look at the get method in the VectorStore (see gpt_index/vector_stores/types.py)
@jerryjliu0 thanks, but how do I do this with the markdownreader llama hub?
@Greg Tanaka As jerry mentioned, seeing the embeddings is a bit more tricky. But for the text in each node, you can do something like this:
Plain Text
documents = loader.load_data(file=Path('./README.md'))

from llama_index.node_parser import SimpleNodeParser
parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)
Thanks. @Logan M how do we control the chunk size?
When using the node parser?

Plain Text
from llama_index import ServiceContext, GPTListIndex
from llama_index.node_parser import SimpleNodeParser
from llama_index.langchain_helpers.text_splitter import TokenTextSplitter

splitter = TokenTextSplitter(chunk_size=512)
parser = SimpleNodeParser(text_splitter=splitter)
nodes = parser.get_nodes_from_documents(documents)

index = GPTListIndex(nodes, service_context=ServiceContext.from_defaults(chunk_size_limit=512))


Without the node parser, just define it in the service context alone πŸ‘

You can also use any text splitter you want (from langchain, or llama_index also has a sentence-based splitter)
Add a reply
Sign up and join the conversation on Discord