Find answers from the community

Updated 3 months ago

404

I am using the code below. Does anyone know how to ignore incase the url is 404 or similar http error
Plain Text
from llama_index.readers import SimpleWebPageReader

loader = SimpleWebPageReader()
documents = loader.load_data(urls=urls)
E
e
4 comments
Would need a change on the source code, to identify when a request throws status code 404 and then skip it, PR's are welcome🙏
If I get the time, I don't mind to contribute. But I've just opted for my own work around
Plain Text
def is_url_reachable(url):
    try:
        response = requests.head(url)
        return response.status_code == 200
    except requests.RequestException: return False
The only downside is you'll have more latency since will need to access every url 2x, but if works for you nice one 💪
the requests.head avoids fetching the content. It's more like 1.1x longer
Add a reply
Sign up and join the conversation on Discord