Find answers from the community

Home
Members
jeffreyip
j
jeffreyip
Offline, last seen 3 months ago
Joined September 25, 2024
hey , i have a quick question on callback manager and query engine. We're trying to allow users to optionally supply a custom trace_id when running query(trace_id="...") here: https://github.com/run-llama/llama_index/blob/879d4dfdf0f02391d634088f7b51f031b95c5bc6/llama-index-core/llama_index/core/base/base_query_engine.py#L46, will you accept this change if we were to make this PR?
2 comments
j
L
Hey llama team, do i need permission to do an one-click observability integration? Like the ones here
https://docs.llamaindex.ai/en/stable/module_guides/observability/observability.html
2 comments
j
L
j
jeffreyip
Β·

Html parsing

Hello, I've a question on loading html files. I'm following the tutorial here (https://github.com/jerryjliu/llama_index/blob/main/examples/chatbot/Chatbot_SEC.ipynb), but with my own html file. However, I'm getting this error for some html files:

Plain Text
INFO:unstructured:Reading document from string ...
INFO:unstructured:Reading document ...
Traceback (most recent call last):
  File "/Users/user/crawl/index.py", line 14, in <module>
    html = loader.load_data(file=Path(f'./output1.html'))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/crawl/venv/lib/python3.11/site-packages/llama_index/readers/llamahub_modules/file/unstructured/base.py", line 36, in load_data
    elements = partition(str(file))
               ^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/crawl/venv/lib/python3.11/site-packages/unstructured/partition/auto.py", line 86, in partition
    elements = partition_html(
               ^^^^^^^^^^^^^^^
  File "/Users/user/crawl/venv/lib/python3.11/site-packages/unstructured/partition/html.py", line 85, in partition_html
    layout_elements = document_to_element_list(document, include_page_breaks=include_page_breaks)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/crawl/venv/lib/python3.11/site-packages/unstructured/partition/common.py", line 71, in document_to_element_list
    num_pages = len(document.pages)
                    ^^^^^^^^^^^^^^
  File "/Users/user/crawl/venv/lib/python3.11/site-packages/unstructured/documents/xml.py", line 52, in pages
    self._pages = self._read()
                  ^^^^^^^^^^^^
  File "/Users/user/crawl/venv/lib/python3.11/site-packages/unstructured/documents/html.py", line 101, in _read
    etree.strip_elements(self.document_tree, ["script"])
  File "src/lxml/cleanup.pxi", line 100, in lxml.etree.strip_elements
  File "src/lxml/apihelpers.pxi", line 41, in lxml.etree._documentOrRaise
TypeError: Invalid input object: NoneType
100 comments
L
j