Find answers from the community

Updated 2 months ago

Hi, I'm using 'CodeHierarchyNodeParser'

Hi, I'm using 'CodeHierarchyNodeParser' and its been great. I was testing my code with several python files and one give me this error 'string index out of range'. Following the traceback and looking the file I'm trying to parse I think that is because one variable definition has a reaally long string. My code is this:
Plain Text
documents = SimpleDirectoryReader(
            input_files=[path],
            file_metadata=lambda x: {"filepath": x},
        ).load_data()

        code = CodeHierarchyNodeParser(
            language=self.language,
            chunk_min_characters=0,
            code_splitter=CodeSplitter(language=self.language, max_chars=10000, chunk_lines=10),
        )
        no_extension_path = self.file_path.replace(".py", "")

        split_nodes = code.get_nodes_from_documents(documents)


How can I fix this?
b
2 comments
The traceback takes me to the _chunk_node method in the line"
Plain Text
while start_byte > 0 and text[start_byte - 1] in (" ", "\t"):
            start_byte -= 1

My text has a length of 11276 and the start_byte is 11278
The problem seems to be the β€˜ character. I still don't now if it is my operative system fault or is ambiguous
Add a reply
Sign up and join the conversation on Discord