When we use python urllib.request.urlretrieve() to download files, there may be a very big problem: urllib.request.urlretrieve() can be blocked for a long time and does not return any response. In this tutorial, we will introduce you how to fix this problem.
Why this problem occur?
urllib.request.urlretrieve(url, filename=None, reporthook=None, data=None)
Becuase urllib.request.urlretrieve() does not provide any methods to set timeout for it. However, we can set a timeout for socket. urllib.request.urlretrieve() creates a socket to open and read a url.
Best Practice to Set Timeout for Python urllib.request.urlretrieve() – Python Web Crawler Tutorial
However, you may find that urllib.request.urlretrieve() also does not return any response for a long time even if you have set tiemout for socket.
Process socket.timeout exception
After you have set timeout for socket, you must process socket.timeout. Here is an example code.
try: local_file, response_headers = urllib.request.urlretrieve(url,local_filename,Schedule) except urllib.error.ContentTooShortError as shortError: print("content too short error") except urllib.error.HTTPError as e: print(e) except urllib.error.URLError as ue: # such as timeout print("fail to download!") except socket.timeout as se: # very important print("socket timeout") except Exception as ee: print(ee)
Then you may find urllib.request.urlretrieve() will raise a soket.timeout exception when timout is up.