Find answers from the community

Updated 3 weeks ago

I was doing a :```pythonnodes = md_node_parser.get_nodes_from_documents(oa_md_docs)```

@kapa.ai I was doing a :

Plain Text
nodes = md_node_parser.get_nodes_from_documents(oa_md_docs)


but all the sudden I got this error when processing an item

90 if kwds["dtype_backend"] == "pyarrow":
91 # Fail here loudly instead of in cython after reading
92 import_optional_dependency("pyarrow")
---> 93 self._reader = parsers.TextReader(src, **kwds)
95 self.unnamed_cols = self._reader.unnamed_cols
97 # error: Cannot determine type of 'names'

File parsers.pyx:574, in pandas._libs.parsers.TextReader.cinit()

File parsers.pyx:663, in pandas._libs.parsers.TextReader._get_header()

File parsers.pyx:874, in pandas._libs.parsers.TextReader._tokenize_rows()

File parsers.pyx:891, in pandas._libs.parsers.TextReader._check_tokenize_status()

File parsers.pyx:2061, in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: EOF inside string starting at row 0
k
c
3 comments
Please stand by @cmosguy as I search the knowledge base. I normally take about 15 seconds or so. :wave:
----------------------
Based on the error message you've provided, it appears you're encountering an issue related to parsing data, which is not directly related to LlamaIndex's get_nodes_from_documents function. The error suggests a problem with pandas' data parsing, specifically a ParserError indicating an unexpected end of file (EOF) inside a string.

This error typically occurs when there's malformed data in the file being read, such as an unclosed quote or an unexpected line break within a quoted field. It's not a LlamaIndex-specific error, but rather an issue with the data format or content you're trying to process.

To resolve this issue, you should:

  1. Check the markdown file(s) you're trying to process for any malformed content, especially looking for unclosed quotes or unexpected line breaks within quoted text.
  1. Ensure that the file is properly formatted and doesn't contain any corrupted data.
  1. If possible, try opening and resaving the file in a text editor to ensure it's properly encoded and doesn't contain any hidden characters that might be causing the parsing error.
  1. If the issue persists, you might want to try reading the file using a different method before passing it to the node parser, or consider preprocessing the file to handle any potential formatting issues.
Unfortunately, the knowledge sources provided don't contain specific information about this particular error or how to handle it within the context of LlamaIndex. If you continue to face issues after checking your data, you may want to seek assistance from the pandas community for the data parsing error, or the LlamaIndex community if you believe it's related to how LlamaIndex is handling the file reading process.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
is there a way to fix markdown during the markdown node processor?
Add a reply
Sign up and join the conversation on Discord