Best Practice to Avoid urllib.request.urlretrieve() Blocked for a Long Time and No Response – Python Tutorial

By | September 3, 2019

When we use python urllib.request.urlretrieve()  to download files, there may be a very big problem: urllib.request.urlretrieve() can be blocked for a long time and does not return any response. In this tutorial, we will introduce you how to fix this problem.

python download file socket timeout

Why this problem occur?

urllib.request.urlretrieve(url, filename=None, reporthook=None, data=None)

Becuase urllib.request.urlretrieve() does not provide any methods to set timeout for it. However, we can set a timeout for socket. urllib.request.urlretrieve() creates a socket to open and read a url.

Best Practice to Set Timeout for Python urllib.request.urlretrieve() – Python Web Crawler Tutorial

However, you may find that urllib.request.urlretrieve() also does not return any response for a long time even if you have set tiemout for socket.

Process socket.timeout exception

After you have set timeout for socket, you must process socket.timeout. Here is an example code.

    try:
        
        local_file, response_headers = urllib.request.urlretrieve(url,local_filename,Schedule)
    except urllib.error.ContentTooShortError as shortError:
        print("content too short error")
    except urllib.error.HTTPError as e:
        print(e)
    except urllib.error.URLError as ue: # such as timeout
        print("fail to download!")
    except socket.timeout as se: # very important
        print("socket timeout")
    except Exception as ee:
        print(ee)

Then you may find urllib.request.urlretrieve() will raise a soket.timeout exception when timout is up.

 

Leave a Reply