When we use python urllib.request.urlretrieve() to download files, there may be a very big problem: urllib.request.urlretrieve() can be blocked for a long time and does not return any response. In this tutorial, we will introduce you how to fix this problem.
Why this problem occur?
- urllib.request.urlretrieve(url, filename=None, reporthook=None, data=None)
Becuase urllib.request.urlretrieve() does not provide any methods to set timeout for it. However, we can set a timeout for socket. urllib.request.urlretrieve() creates a socket to open and read a url.
Best Practice to Set Timeout for Python urllib.request.urlretrieve() – Python Web Crawler Tutorial
However, you may find that urllib.request.urlretrieve() also does not return any response for a long time even if you have set tiemout for socket.
Process socket.timeout exception
After you have set timeout for socket, you must process socket.timeout. Here is an example code.
- try:
- local_file, response_headers = urllib.request.urlretrieve(url,local_filename,Schedule)
- except urllib.error.ContentTooShortError as shortError:
- print("content too short error")
- except urllib.error.HTTPError as e:
- print(e)
- except urllib.error.URLError as ue: # such as timeout
- print("fail to download!")
- except socket.timeout as se: # very important
- print("socket timeout")
- except Exception as ee:
- print(ee)
Then you may find urllib.request.urlretrieve() will raise a soket.timeout exception when timout is up.