Find answers from the community

Updated 4 months ago

TrafilaturaWebReader

At a glance

A community member is experiencing an error when trying to use the Trafilatura Website Loader, where the loader class name is not found in the library. The issue is caused by a bug in the library.json file, which does not include the TrafilaturaWebReader loader. Another community member raised a pull request to fix the issue, which was merged. However, the updated loader is not immediately visible on the Llama Hub. After some time, the community member confirms that the issue is now resolved and the loader is working as expected.

Useful resources
When I try to use Trafilatura Website Loader, I get this error:
Plain Text
/usr/local/lib/python3.10/dist-packages/llama_index/readers/download.py in download_loader(loader_class, loader_hub_url, refresh_cache, use_gpt_index_import, custom_path)
    138         library = json.loads(library_raw_content)
    139         if loader_class not in library:
--> 140             raise ValueError("Loader class name not found in library")
    141 
    142         loader_id = library[loader_class]["id"]

ValueError: Loader class name not found in library

Python code:
Plain Text
from llama_index import download_loader

TrafilaturaWebReader = download_loader("TrafilaturaWebReader")

loader = TrafilaturaWebReader()
documents = loader.load_data(urls=['https://google.com'])
r
L
11 comments
and its causing the issue.
Bug in library.json for TrafilaturaWebReader. Should I make an issue?
oh I just raised a PR
Thank you. Any idea when this will be available?
Its merged. Should be available now.
TrafilaturaWebReaderis not visible on Llama Hub.
Attachment
image.png
I just checked and its working now.
Add a reply
Sign up and join the conversation on Discord