Best Practice to Avoid urllib.request.urlretrieve() Blocked for a Long Time and No Response

When we use python urllib.request.urlretrieve() to download files, there may be a very big problem: urllib.request.urlretrieve() can be blocked for a long time and does not return any response. In this tutorial, we will introduce you how to fix this problem.

Why this problem occur?

urllib.request.urlretrieve(url, filename=None, reporthook=None, data=None)

urllib.request.urlretrieve(url, filename=None, reporthook=None, data=None)

urllib.request.urlretrieve(url, filename=None, reporthook=None, data=None)

Becuase urllib.request.urlretrieve() does not provide any methods to set timeout for it. However, we can set a timeout for socket. urllib.request.urlretrieve() creates a socket to open and read a url.

Best Practice to Set Timeout for Python urllib.request.urlretrieve() – Python Web Crawler Tutorial

However, you may find that urllib.request.urlretrieve() also does not return any response for a long time even if you have set tiemout for socket.

Process socket.timeout exception

After you have set timeout for socket, you must process socket.timeout. Here is an example code.

    try:
        
        local_file, response_headers = urllib.request.urlretrieve(url,local_filename,Schedule)
    except urllib.error.ContentTooShortError as shortError:
        print("content too short error")
    except urllib.error.HTTPError as e:
        print(e)
    except urllib.error.URLError as ue: # such as timeout
        print("fail to download!")
    except socket.timeout as se: # very important
        print("socket timeout")
    except Exception as ee:
        print(ee)

try:
local_file, response_headers = urllib.request.urlretrieve(url,local_filename,Schedule)
except urllib.error.ContentTooShortError as shortError:
print("content too short error")
except urllib.error.HTTPError as e:
print(e)
except urllib.error.URLError as ue: # such as timeout
print("fail to download!")
except socket.timeout as se: # very important
print("socket timeout")
except Exception as ee:
print(ee)

    try:
        
        local_file, response_headers = urllib.request.urlretrieve(url,local_filename,Schedule)
    except urllib.error.ContentTooShortError as shortError:
        print("content too short error")
    except urllib.error.HTTPError as e:
        print(e)
    except urllib.error.URLError as ue: # such as timeout
        print("fail to download!")
    except socket.timeout as se: # very important
        print("socket timeout")
    except Exception as ee:
        print(ee)

Then you may find urllib.request.urlretrieve() will raise a soket.timeout exception when timout is up.

Best Practice to Avoid urllib.request.urlretrieve() Blocked for a Long Time and No Response – Python Tutorial

Why this problem occur?

Process socket.timeout exception

Leave a Reply Cancel reply