Trying to test the sentenceWindowParser

arashaga · 2024-01-09T22:09:40.462Z

Trying to test the sentenceWindowParser for a rag use case but it seem some of my chucks are exceeding the context length of 8191, hwo do I control that? On the documentaiton it says that it bypasses the chunking, is there anyway I could workaround this?

LLogan M

hmm, I think you'd need to decrease the sentence window size a bit 🤔 What window value did you set?

aarashaga

3

LLogan M

Is it possible to share something to replicate the issue? Or even the full traceback?

aarashaga

sure. Below is the code and the log is attached:

Plain Text

UnstructuredReader = download_loader("UnstructuredReader", refresh_cache=True)
dir_reader = SimpleDirectoryReader('../data/',
                                    filename_as_id =True,
                                    required_exts=[".html"], file_extractor={
                                    ".html": UnstructuredReader(),
                                    })
documents = dir_reader.load_data()

document = Document(text = "\n\n".join(doc.text for doc in documents))

from llama_index.node_parser import SentenceWindowNodeParser

node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text")
    
llm = AzureOpenAI(
  ...
)

embed_model = AzureOpenAIEmbedding(
    model="text-embedding-ada-002",
    ...

service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model,node_parser=node_parser)
sentence_index = VectorStoreIndex.from_documents([document], service_context=service_context)

Find answers from the community

Trying to test the sentenceWindowParser