Find answers from the community

Updated 3 months ago

Trying to test the sentenceWindowParser

Trying to test the sentenceWindowParser for a rag use case but it seem some of my chucks are exceeding the context length of 8191, hwo do I control that? On the documentaiton it says that it bypasses the chunking, is there anyway I could workaround this?
L
a
4 comments
hmm, I think you'd need to decrease the sentence window size a bit πŸ€” What window value did you set?
Is it possible to share something to replicate the issue? Or even the full traceback?
sure. Below is the code and the log is attached:

Plain Text
UnstructuredReader = download_loader("UnstructuredReader", refresh_cache=True)
dir_reader = SimpleDirectoryReader('../data/',
                                    filename_as_id =True,
                                    required_exts=[".html"], file_extractor={
                                    ".html": UnstructuredReader(),
                                    })
documents = dir_reader.load_data()

document = Document(text = "\n\n".join(doc.text for doc in documents))

from llama_index.node_parser import SentenceWindowNodeParser

node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text")
    
llm = AzureOpenAI(
  ...
)

embed_model = AzureOpenAIEmbedding(
    model="text-embedding-ada-002",
    ...

service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model,node_parser=node_parser)
sentence_index = VectorStoreIndex.from_documents([document], service_context=service_context)
Add a reply
Sign up and join the conversation on Discord