Open the web browser.
Visit a URL.
Save/Download the URL to your
Move it to a subfolder so that
Downloads/doesn’t get cluttered.
Rename it to a time or a version number, e.g.
The problem with time
If you’re new to programming, chances are, you are new to the concept of how damn complex time can be. Not just the complicated ways in how humans struggle with time, e.g. “is 10:30PM too late to call someone on a weeknight”, but the ways that computers from different parts of the world (or solar system) need to coordinate with each other to the millisecond, including the hashing out of different time zones, daylight savings time, time formats (e.g. European vs. American) and leap years/seconds.
Check out Noah Sussman’s classic writeup, “Falsehoods programmers believe about time” for more complexities.
I can say with only slight exaggeration that learning how a language deals with time is almost always the most painful part of learning a new language (or framework) for me. I haven’t quite mastered the array of Python’s time libraries and objects, including time, datetime, calendar, tzinfo, timezone, timedelta. And judging by the equally numerous third-party packages created to make time-handling easier – pytz, arrow, dateutil, moment, delorean – I’m guessing many users also find it difficult.
So the more you can practice working with time, gradually, the less insane you’ll become when programming time functions on deadline.
Python libraries and functions
For this example, we’ll need the libraries and functions for downloading files, creating new directories and files, and keeping track of time.
Make a directory, including any intermediary subdirectories:
import os os.makedirs("/tmp/lah/dee/dah", exist_ok = True)
Given an array of strings, create a directory path. I know you’re thinking, duh, why do I need a function to do something as easy as
"some" + "/" + "path.html"…trust me, you want to leave the details up to the Python library:
import os os.path.join("hey", "you", "index.html") # 'hey/you/index.html' os.path.join("hey/", "you/", "index.html") # 'hey/you/index.html'
We just need this to parse a given URL to get its host/domain, i.e. its
from urllib.parse import urlparse u = urlparse("http://www.example.com/some/path.html?hello=world&a=apples") u.scheme # 'http' u.netloc # 'www.example.com' u.path # '/some/path.html' u.query # 'hello=world&a=apples'
import re re.sub("\W", '_', "http://www.example.com/some/path.html?hello=world&a=apples") # 'http___www_example_com_some_path_html_hello_world_a_apples'
The Requests package
This is the only external library we’ll use; Requests is so popular that it’s pretty much ubiquitous anyway.
import requests requests.get("http://www.example.com/some/path.html?hello=world&a=apples")
from datetime import datetime
OK, so if you read the list of
datetime class methods, you might be confused at the seemingly similar functions of
utcnow(). The differences essentially involve the concept of time zones, and whether or not you want a datetime object that is “aware” of its timezone.
Rather than get into that mess, let’s just stick to using utcnow(), which works like this:
d = datetime.utcnow() print(d) # 2015-07-11 19:04:56.039051 d # d is a datetime.datetime object: # datetime.datetime(2015, 7, 11, 19, 4, 56, 39051)
I ran the above function at around 12PM in California, i.e. Pacific Standard Time, which is 7 hours behind UTC, hence, why the hour part of the result is
19 instead of
In other words, if I had used the now() function, I would’ve gotten this:
datetime.now() # datetime.datetime(2015, 7, 11, 12, 4, 56, 39051)
Since I’m saving files to my own computer, wouldn’t it make more sense to use its timezone, so that when glancing at a file list, I don’t have to mentally offset the time to see when I actually downloaded the files?