Find answers from the community

Updated 6 months ago

HTML Files for menus

At a glance

The community members are discussing an issue where they are getting "None" as a result and suspect it may be due to not using an unstructured reader. They mention that their college has set up their menus in a way that results in "horrible HTMLs per week". One community member suggests using the SimpleWebLoader from the LlamaHub website, and provides some sample code using a CodeSplitter to handle the text splitting on the nodes. The community members seem interested in trying this solution.

Useful resources
Any help?
b
L
16 comments
do you have any error?
It just gives me "None"
I think it's because I may be not using an unstructed reader I think?
Are you seeing docuemnts in documents list?
im not, hold on, so I may have found a solution to just having llama index using a web reader to get all of the menu websites because my college setup their menus like this
with horrible htmls per week and its stupid
However, if I pass all of the urls into like this
your solution should work
but you'll probably want to use a CodeSplitter for the text splitting on the nodes
Plain Text
text_splitter = CodeSplitter(
      language="html",
      chunk_lines=120,
      chunk_lines_overlap=10,
      max_chars=1000,
    )

    html_documents = [Document(text=html, metadata={url:url})]
    node_parser = SimpleNodeParser.from_defaults(text_splitter=text_splitter)
I'll make sure to try that, I'm interested to see if the simple web loader works because im lazy!
Add a reply
Sign up and join the conversation on Discord