Apr 2, 2020
We interact with files in python by opening the file and assigning it to a variable. This variable is called a file handle. The file handle is an object, and it has member fuctions such as .read()
and .write()
that allow us to either read the file contents or write to the file. After we are done working with the file, we must close the file handle.
We open a file and generate the associated handle with the open()
function. This function takes two arguments:
the mode in which the file should be read. Modes, with usage examples, include,
read-only ("r"
). Use this mode to read a file, but not write anything to it.
handle = open("filename.txt", "r")
write-only ("w"
). Use this mode to write to an empty file.
handle = open("filename.txt", "w")
append ("a"
). Use this mode to add content to an existing file.
handle = open("filename.txt", "a")
We will first work with a simple test file, called "testfile.txt"
. The file has the following contents:
This is line one of the file.
This is line two of the file.
This is line three of the file.
(You can download this file from the class website here: http://wilkelab.org/classes/SDS348/data_sets/testfile.txt)
We will open this file and read it in one go:
# The name of the file we want to open
filename = "testfile.txt"
file_handle = open(filename, "r") # Open file as read-only
file_contents = file_handle.read() # Read contents of file
file_handle.close() # Close file when finished (important!)
# Print out the contents of the file
print(file_contents)
Note that the .read()
method saves the entire body of the file to a single string. Another convenient way to read a file is to retrieve it as a list of lines, so that we can easily loop over the file contents. We can do this with the .readlines()
method:
file_handle = open(filename, "r") # Open file as read-only
file_lines = file_handle.readlines() # Read contents of file
file_handle.close() # Close file
print(file_lines)
We see that .readlines()
gives us a list with three entries, where each entry is one line in the file. We can also see that all entries of this list end with the symbol \n
. This symbols represents the new-line character. It determines when one line ends and the next one begins.
Now that we have the file lines in a list, we can easily loop over them, and perform some calculations as needed:
# Loop over each line in file, and use a counter to keep track of lines
counter = 1
for line in file_lines:
print(counter, line)
counter += 1
You may notice that there is an empty line between each line of output. Can you guess why? See below in Problem 4 for an answer.
Opening a file in write-mode will overwrite the file, but opening in append-mode will add to the bottom of an existing file. Note that if we open a file for writing (or appending) that doesn't already exist, then a new file will be created with the specified name. By contrast, if we attempt to open a non-existing file for reading, we will receive an error message.
To write to a file, we use the .write()
method on the file handle:
# Define the name of the file we'll be working with
filename = "testfile2.txt"
# It is often good to name the file handles in
# such a way that we know whether we are reading from
# or writing to the file. Here we open the file
# for writing, so we call it `outfile`:
outfile = open(filename, "w")
# The write function doesn't add a newline automaticall,
# so we do this explicitly:
outfile.write("I'm writing a sentence to this file.\n")
outfile.close()
# Verify that we wrote to the file correctly,
# by opening and reading it
infile = open(filename, "r") #note the "r" mode
contents = infile.read()
infile.close()
print(contents)
Note that the above code created a new file and wrote a single sentence to it. No matter how many times you execute this code, the file will have the same contents.
To add new contents to an existing file, open the file in append ("a") mode:
# Open a file for appending, and append text to it
for i in range(5): # do this five times
outfile = open(filename, "a")
outfile.write("I'm writing another line to this existing file.\n")
outfile.close()
# Verify that we wrote to the file correctly, by opening and reading it
infile = open(filename, "r") #note the "r" mode
contents = infile.read()
infile.close()
print(contents)
with
statement¶As you can see from the above code examples, when we are dealing with files we need to write many blocks of code of the form .open()
, code to interact with file, .close()
. This can become cumbersome, and in particular we may forget to close some of the files that we opened. Not closing files can cause all sorts of trouble. For example, other applications may not be able to interact with a file until your program has properly closed it. Or, if you write a loop that opens many files but never closes them, you may crash your computer. Thus, since closing files is so critical, wouldn't it be nice if Python did this for us automatically? It will do so, if we're using the with
statement.
In the with
statment, instead of writing
file_handle = open(filename, mode)
we write
with open(filename, mode) as file_handle:
... # code block that operates on the file handle
The file will be closed automatically once we leave the block.
Thus, we could rewrite the last two examples with with
in the following form:
filename = "testfile2.txt"
with open(filename, "w") as outfile: # open file for writing
outfile.write("I'm writing a sentence to this file.\n")
for i in range(5): # do this five times
with open(filename, "a") as appendfile: # open file for appending
appendfile.write("I'm writing another line to this existing file.\n")
# Verify that we wrote to the file correctly, by opening and reading it
with open(filename, "r") as infile: # open file for reading
contents = infile.read()
print(contents)
Problem 1:
Download the file "road_not_taken.txt"
from the class website: http://wilkelab.org/classes/SDS348/data_sets/road_not_taken.txt
(You may have to right-click the link and choose "save as". Make sure to save the file in the same location where your Jupyter notebook is.) This file contains the famous poem "The Road not Taken" by Robert Frost.
(a) Write a program that reads the file in one go and prints out the file contents.
(b) Write a program that reads in the file line by line and counts the total number of lines.
(c) Write a program that counts the number of letters in the file. Use the function count_letters()
we have discussed in a previous class. Then print out how offen the different vowels (a, e, i, o, u) are used in this document.
# Problem 1a:
# Open the file for reading.
infile = open("road_not_taken.txt", "r")
contents = infile.read() # read the whole file at once
infile.close() # close file when we're done
print(contents)
# Problem 1b:
# Open the file for reading.
infile = open("road_not_taken.txt", "r")
lines = infile.readlines()
infile.close()
print('The file "road_not_taken.txt" contains', len(lines), 'lines.' )
# Problem 1c:
# count_letters function
def count_letters(s):
counts = {} # empty dict
for c in s:
if c in counts: # does letter exist in dict?
counts[c]+=1 # yes, increase count by 1
else:
counts[c]=1 # no, set count to 1
return counts # return result
# Open the file for reading and count the letters
with open("road_not_taken.txt", "r") as infile:
counts = count_letters(infile.read())
print('Vowel usage in the poem "The Road not Taken":')
for c in ['a', 'e', 'i', 'o', 'u']:
print(c, ":", counts[c])
Problem 2:
Read in the file "road_not_taken.txt"
, loop over every line in the file, identify the lines that contain the string "road"
(ignoring case), and write those lines into a new file called "extracted_lines.txt"
. Then, read the file "extracted_lines.txt"
back in and print its contents, to verify that everything worked right.
# create file with extracted lines
infile = open("road_not_taken.txt", "r")
outfile = open("extracted_lines.txt", "w")
for line in infile: # we can iterate over the file directly
if "road" in line.lower(): # ignore case by converting to lower case
outfile.write(line)
infile.close()
outfile.close()
# read the generated file back in and print out its contents
infile = open("extracted_lines.txt", "r")
print(infile.read())
infile.close()
Problem 3:
Take the solution to one of your previous problems and rewrite them using a with
statement. (Skip this problem if you have used with
statements throughout.)
# Problem 1b rewritten using `with` statement:
with open("road_not_taken.txt", "r") as infile:
lines = infile.readlines()
print('The file "road_not_taken.txt" contains', len(lines), 'lines.' )
Problem 4:
Using one with
statement and one for
loop, open the file "road_not_taken.txt"
as input file, convert it to upper case, and write it to a new file called "road_not_taken_upper.txt"
. Then read the newly generated file back in and print out the first 10 lines (+ line numbers), to make sure everything looks right. Make sure you don't get any empty lines between the individual lines you print.
with open("road_not_taken.txt", "r") as infile, \
open("road_not_taken_upper.txt", "w") as outfile:
for line in infile:
outfile.write(line.upper())
with open("road_not_taken_upper.txt", "r") as infile:
i = 0
for line in infile:
# since each line ends in a newline, and the print() function
# also adds a newline, we have to use the .rstrip() function
# to remove the newline character at the end of each line.
print(i+1, line.rstrip())
i += 1
if i>=10:
break
Problem 5:
Instead of reading the file "road_not_taken.txt"
from your own computer, read it directly from the web. Hint: Use the function urlopen()
from the urllib
package. And pay attention to the type of data that you receive from the handle created by urlopen()
.
from urllib.request import urlopen
file_URL = "http://wilkelab.org/classes/SDS348/data_sets/road_not_taken.txt"
with urlopen(file_URL) as infile:
i = 0
for line in infile:
# urllib returns encoded strings (byte objects), and
# we need to use the `decode()` function to turn them
# into strings we can print
print(i+1, line.decode().rstrip())
i += 1
if i>=10:
break