HTML Files for menus

At a glance

The community members are discussing an issue where they are getting "None" as a result and suspect it may be due to not using an unstructured reader. They mention that their college has set up their menus in a way that results in "horrible HTMLs per week". One community member suggests using the SimpleWebLoader from the LlamaHub website, and provides some sample code using a CodeSplitter to handle the text splitting on the nodes. The community members seem interested in trying this solution.

Useful resources

LLawWing

Any help?

16 comments

bbmax

do you have any error?

LLawWing

It just gives me "None"

LLawWing

I think it's because I may be not using an unstructed reader I think?

bbmax

Are you seeing docuemnts in documents list?

LLawWing

im not, hold on, so I may have found a solution to just having llama index using a web reader to get all of the menu websites because my college setup their menus like this

LLawWing

Attachment

LLawWing

with horrible htmls per week and its stupid

bbmax

lol

LLawWing

However, if I pass all of the urls into like this

LLawWing

https://llamahub.ai/l/web-simple_web

bbmax

your solution should work

bbmax

but you'll probably want to use a CodeSplitter for the text splitting on the nodes

LLawWing

Attachment

bbmax

Plain Text

text_splitter = CodeSplitter(
      language="html",
      chunk_lines=120,
      chunk_lines_overlap=10,
      max_chars=1000,
    )

    html_documents = [Document(text=html, metadata={url:url})]
    node_parser = SimpleNodeParser.from_defaults(text_splitter=text_splitter)

LLawWing

I'll make sure to try that, I'm interested to see if the simple web loader works because im lazy!

Add a reply

Find answers from the community

HTML Files for menus