Find answers from the community

Updated 10 months ago

```py

Plain Text

reader = FlatReader()
node_parser = UnstructuredElementNodeParser()
docs = reader.load_data(Path("your_path.html"))
raw_nodes = node_parser.get_nodes_from_documents(docs)

why i got the BadZipFile: File is not a zip file error? Do you know how to solve this? thank you :D

24 comments

WWhiteFang_Jr

From where did you get reader = FlatReader() this reader?

LLogan M

its actually slightly hidden in llama-index, it just reads files as is with zero processing

LLogan M

@Nyse are your running this reader on zip files?

NNyse

nope

NNyse

wait, it is alright using python 3.11 version?

NNyse

because i used to run on colab, and it's works

NNyse

(colab python version 3.10)

NNyse

from llama_index.readers.file.flat_reader import FlatReader

NNyse

from this one

NNyse

after i change my python to 3.10 still the same

NNyse

can you do that for me? this one was my html. This one is a public html

NNyse

@WhiteFang_Jr @Logan M sorry for interrupt 😄

NNyse

is it beacuse my vscode macbook?

NNyse

finally i'm done

WWhiteFang_Jr

Let me try with one sample html. Can you give me the code that you are trying with

NNyse

thank you

WWhiteFang_Jr

lol 😅

NNyse

no, because i forgot download nltk perceptron

NNyse

thank you very much

WWhiteFang_Jr

Ah okay, Its great that you solved it on your own

NNyse

but i'm still confused

NNyse

why do we need to download nltk perceptron

NNyse

and what does the relation between badzipfile

WWhiteFang_Jr

This even i dont know. But if you are using local embed model they use nltk for their working

Add a reply