Find answers from the community

Home
Members
Spencer Braun
S
Spencer Braun
Offline, last seen 3 months ago
Joined September 25, 2024
I've noticed that when I try to access a node's start_char_idx in version 0.8.29 it is always None, making it impossible to map the node text back to a location in the original document. With version 0.7.13, it works fine and I get a character idx. Is this known behavior and is there another way to map the node text back to its location in a document? For example when I run this code in both versions:
Plain Text
from llama_index import (
    Document,
    OpenAIEmbedding,
    ServiceContext,
    VectorStoreIndex,
)
from llama_index.llms import OpenAI
from llama_index.node_parser import SimpleNodeParser


with open("test.txt", "r") as f:
    text = f.read()

documents = [Document(text=text, metadata={"doc_id": "1234"})]
node_parser = SimpleNodeParser.from_defaults(chunk_size=250, chunk_overlap=20)
service_context = ServiceContext.from_defaults(
    embed_model=OpenAIEmbedding(),
    node_parser=node_parser,
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.7, max_tokens=500),
)
index = VectorStoreIndex.from_documents(documents, service_context=service_context)

query = "what are the liability limits?"
retriever = index.as_retriever()
results = retriever.retrieve(query)

results[0].node.__dict__
I get
Plain Text
results[0].node.start_char_idx
is None in 0.8.29 and equal to 11058 in version 0.7.13.
2 comments
S
L