LlamaIndex

Log inLog into community

Find answers from the community

Updated 4 months ago

404

404

At a glance

·

I am using the code below. Does anyone know how to ignore incase the url is 404 or similar http error

Plain Text

from llama_index.readers import SimpleWebPageReader

loader = SimpleWebPageReader()
documents = loader.load_data(urls=urls)

E

e

4 comments

EEmanuel Ferreira

Would need a change on the source code, to identify when a request throws status code 404 and then skip it, PR's are welcome🙏

If I get the time, I don't mind to contribute. But I've just opted for my own work around

Plain Text

def is_url_reachable(url):
    try:
        response = requests.head(url)
        return response.status_code == 200
    except requests.RequestException: return False

EEmanuel Ferreira

The only downside is you'll have more latency since will need to access every url 2x, but if works for you nice one 💪

the requests.head avoids fetching the content. It's more like 1.1x longer

Add a reply

Sign up and join the conversation on Discord