Shalitha Suranga Programmer | Author of Neutralino.js and Jerverless

Python file handling: A complete guide

10 min read 3050

Python file handling: A complete guide

Python is a popular interpreted and dynamically typed programming language for building web services, desktop apps, automation scripts, and machine learning projects. Programmers often have to access the operating system’s file system when they work with Python-based software projects.

For example, we use text files as inputs, write text files as outputs, and process binary files often. Like any other popular, general-purpose programming language, Python also offers cross-platform file handling features. Python provides file handling features via several inbuilt functions and standard modules.

In this article, I will explain everything you need to know about Python file handling, including:

Prerequisites

Before getting started with the tutorial, make sure that you have the Python 3 interpreter installed. Otherwise, install the latest Python interpreter from the official releases. You can use this tutorial’s code snippets in your existing Python projects, too.

Reading files in Python

As the first activity, let’s write some code to read a text file. We need to create a file object first to read files.

Python offers the inbuilt open function to create a file object with several modes, such as read mode, write mode, etc. Create a text file named myFile.txt and input the following content.

Programming languages
C
C++
Python
JavaScript
Go

Now, create a new file named main.py and add the following code snippet.

myFile = open("myFile.txt", "r") # or open("myFile.txt")
print(myFile.read())
myFile.close()

The above code snippet’s first line creates the myFile file object with the given filename. The inbuilt open function creates a file handler using read mode because we have provided the r flag via the second parameter.

Make sure to call the close method to free up resources after using the file. The read method returns the file content, so you will see the content once you execute the above code, as shown below.

The content is displayed in read method

We made a custom demo for .
No really. Click here to check it out.

The read method reads the entire file at once. If you don’t want to read all at once, you can specify a byte size with the read method’s parameter. For example, the following code snippet reads only the first 11 bytes.

myFile = open("myFile.txt", "r")
print(myFile.read(11)) # Programming
myFile.close()

You will see the first word (“Programming”) as the output — because the first word has 11 letters and a letter’s size is equal to one byte in ASCII encoding. If you print the result of read(11) again, you will see the next 11 bytes (“ languages\n”) because the file cursor moved 11 places with the previous read(11) method call. You can reset the file cursor back to the beginning by using the seek method, as shown in the following example.

myFile = open("myFile.txt")

print(myFile.read(11)) # Programming
print(myFile.read(10)) # languages
myFile.seek(0) # Sets file cursor to the beginning
print(myFile.read(11)) # Programming
myFile.close() 

In most scenarios, it’s easy to process file content line by line. You don’t need to implement a lines-oriented file reading mechanism by yourself  —  Python provides inbuilt features to read a file line by line. You can read a file line by line with a for-in loop and the readlines method, as shown below.

myFile = open("myFile.txt", "r")
for line in myFile.readlines():
    print(line)
myFile.close()

It’s possible to get the current line number with a for-enumerate loop   because the readlines method will return lines using the list type. The following code snippet will print the line content with its respective line number.

myFile = open("myFile.txt", "r")
for i, line in enumerate(myFile.readlines()):
    print(i, line) # line number and content
myFile.close()

Writing files in Python

Earlier, we created file objects with the read mode by using the r flag. Writing files is not possible with the read mode, so we have to use the write mode (w) for writing files.

It is also possible to enable both read and write modes together using the r+ or w+ flag; we will use the w+ flag in upcoming examples.

To get started with file writing, let’s input the following text to the current myFile.txt by writing some Python code.

Programming languages
Rust
Ruby
TypeScript
Dart
Assembly

Use the following script to update myFile.txt with the above content.

myFile = open("myFile.txt", "w")

content = """Programming languages
Rust
Ruby
TypeScript
Dart
Assembly"""

myFile.write(content)
myFile.close()

Here, we defined the text file content using the Python multiline string syntax, and we wrote the content to file using the write method. Make sure to use the write mode with the w flag — otherwise, the write operation will fail with the io.UnsupportedOperation exception.

Sometimes, we often have to append new content to an existing file. In those scenarios, reading and writing the entire content is not a good approach due to the higher resources consumption. Instead, we can use the append mode (a).

Look at the following code. It will add a new programming language to the list in myFile.txt.

myFile = open("myFile.txt", "a")
myFile.write("\nBash")
myFile.close()

The above code snippet adds a new line character (\n) and a new word to the existing file without writing the entire file content. As a result, we will see a new entry in our programming languages list. Try adding more entries and see what happens!

Reading file attributes in Python

Apart from the original file content, a file on the disk will contain some metadata, or file attributes, which include things like size, last modified time, last accessed time, etc.

Look at the file code below, which displays file size, the last accessed time, and the last modified time.

import os, time

stat = os.stat("myFile.txt")
print("Size: %s bytes" % stat.st_size)
print("Last accessed: %s" % time.ctime(stat.st_atime))
print("Last modified: %s" % time.ctime(stat.st_mtime))

The os.stat function returns a stat results object with many file attribute details. Here we used st_size to get the file size, at_atime to get the last file accessed timestamp, and st_mtime to get the last modified timestamp. The stat results object can be different according to your operating system. For example, on the Windows operating system, you can retrieve Windows-specific file attributes via the st_file_attributes key.

If you need to get only file size, you can use the os.path.getsize method without retrieving all metadata, as shown in the following code.

import os, time

size = os.path.getsize("myFile.txt")
print("Size: %s bytes" % size)

Creating new Python directories

Python offers the os.mkdir function to create a single directory. The following code snippet creates myFolder in the current working directory.

import os

os.mkdir("myFolder")

If you try to make multiple directories recursively with the above code, it will fail. For example, you can’t create myFolder/abc at once because it requires the creation of multiple directories. In those scenarios, the os.makedirs function will help us, as shown below.

import os

os.makedirs("myFolder/abc") # Creates both "myFolder" and "abc"

Reading Python directory contents

Python also provides an easy API to list directory contents via the os.listdir function. The following code snippet lists all files and directories in your current working directory.

import os

cur_dir = os.getcwd()
entries = os.listdir(cur_dir)

print("Found %s entries in %s" % (len(entries), cur_dir))
print('-' * 10)
for entry in entries:
    print(entry)

Once you execute the above script, it will show your current directory’s entries, as shown below.

The entries in your current directory

Try executing the script from a different directory. Then it will display entries of that specific directory because we use the os.getcwd function to get the current working directory.

Sometimes we need to list directory content recursively. The os.walk function helps us with the recursive directory listing. The following code lists all entries of the current working directory recursively.

import os

cur_dir = os.getcwd()
for root, sub_dirs, files in os.walk(cur_dir):
    rel_root = os.path.relpath(root)
    print("Showing entries of %s" % rel_root)
    print("-" * 10)
    for entry in sub_dirs + files:
            print(entry)

The os.walk function has a recursive implementation internally. It returns three values for each entry:

  • Root
  • Subdirectories
  • File entries

Here we used root, sub_dirs, and files variables respectively, with a for-loop to capture all entries.

All entries are captured using a for-loop

Removing files or directories in Python

We can use the os.remove function to remove a file. It’s possible to use the os.path.exists function before os.remove to prevent exceptions. Look at the following example code snippet.

import os

file_to_remove = "myFile.txt"

if os.path.exists(file_to_remove):
    os.remove(file_to_remove)
else:
    print("%s doesn't exist!" % file_to_remove)

The Python standard library also offers the os.rmdir function to remove a single directory. It behaves similar to os.mkdir and won’t remove a directory if the particular directory has some entries. First, try to remove a single directory with the following code.

import os

dir_to_remove = "myFolder"

if os.path.exists(dir_to_remove):
    os.rmdir(dir_to_remove)
else:
    print("%s doesn't exist!" % dir_to_remove)

The above code will throw an error if myFolder contains subfolders or files. Use the following code snippet to remove a directory recursively.

import os, shutil

dir_to_remove = "myFolder"

if os.path.exists(dir_to_remove):
    shutil.rmtree(dir_to_remove) # Recursively remove all entries
else:
    print("%s doesn't exist!" % dir_to_remove)

Performing file searches in Python

When we work with automation scripts, sometimes we need to perform file searches on the disk. For example, programmers often need to find log files, image files, and various text files via their Python scripts. There are a few different approaches to performing file searches in Python:

  • Finding all entries with the os.listdir function and checking each entry with an if condition inside a for loop
  • Finding all entries recursively with the os.walktree function and validating each entry with an if condition inside a for loop.
  • Querying all entries with the glob.glob function and obtaining only entries you need

Overall, the third approach is best for most scenarios because it has inbuilt filtering support, very good performance, and requires minimal code from the developer’s end (more Pythonic). Let’s implement a file search with the Python glob module.

import glob, os

query = "**/*.py"

entries = glob.glob(query, recursive=True)
no_of_entries = len(entries)
if no_of_entries == 0:
    print("No results for query: %s" % query)
else:
    print("Found %s result(s) for query: %s" % (no_of_entries, query))

print("-" * 10)
for entry in entries:
    print(entry)

The above code lists all Python source files in the current directory recursively. The first two asterisks (**) in the query variable instruct Python to search every sub-directory, while the last asterisk refers to any filename.

Run the above script. You will see Python source files, as shown below.

The Python source files after you run the script

Try to search different file types by changing the query variable.

Processing binary files in Python

Earlier, we processed text files. The inbuilt open function creates file objects with the text mode (t) by default. Non-text files such as image files, zip files, and video files cannot be viewed as plain-text files — because there are no readable English sentence binary files. Therefore, we have to treat binary files as non-text files via byte-level (or bit-level) processing.

To get started with binary file handling, let’s write a binary file with some bytes. We are going to save the following bytes into myFile.bin.

01010000 01111001 01110100 01101000 01101111 01101110

For simplicity, we can represent the above bytes with the following decimal values respectively.

80 121 116 104 111 110

Now, add the following code to your Python source file and execute it to create the binary file.

myBinaryFile = open("myFile.bin", "wb") # wb -> write binary
bytes = bytearray([80, 121, 116, 104, 111, 110])
myBinaryFile.write(bytes)
myBinaryFile.close()

Here, we passed a byte array instance to the file object’s write method. Also, note that we used binary mode (b) to create the file object. After executing the above code snippet, open the newly created myFile.bin with your favorite text editor. You will see the following result.

The view of the myFile.bin file

We have received “Python” as the output because the byte array’s bytes represent known ASCII characters. For example, 80 (01010000) represents letter P in ASCII encoding. Even though we saved readable text inside a binary file, almost all binary files contain unreadable byte streams. Try opening an image file via a text editor.

Now we can see the binary file read operation in the following example code.

myBinaryFile = open("myFile.bin", "rb")

bytes = myBinaryFile.read()
print(bytes) # bytearray(b'Python')
print("Bytes: ", list(bytes)) # Bytes:  [80, 121, 116, 104, 111, 110]
myBinaryFile.close()

Python returns bytes with the read method for the binary mode. Here we converted bytes to a bytearray instance using the bytearray constructor.

Creating and extracting from Python archives

Programmers often use archive files with Python-based web applications, web services, desktop applications, and utility programs to output or input multiple files at once. For example, if you are building a web-based file manager, you may offer a feature for users to download multiple files at once via a programmatically generated zip file.

Python standard library offers archive file processing APIs via the shutil module. First, let’s make an archive with myFolder’s content. Look at the following code. Make sure to create myFolder and add some files into it before running the code snippet.

import shutil

output_file = "myArchive"
input_dir = "myFolder"
shutil.make_archive(output_file, "zip", input_dir)

You can extract the archive file into myNewFolder with the following code.

import shutil

input_file = "myArchive.zip"
output_dir = "myNewFolder"
shutil.unpack_archive(input_file, output_dir)

Copying and moving files

The shutil module offers cross-platform API functions to copy and move files as well. Look at the following examples.

import shutil

# copy main.py -> main_copy.py
shutil.copy("main.py", "main_copy.py")
# move (rename) main_copy.py -> main_backup.py 
shutil.move("main_copy.py", "main_backup.py")
# recursive copy myFolder -> myFolder_copy
shutil.copytree("myFolder", "myFolder_copy")
# move (rename) myFolder_copy -> myFolder_backup
# if myFolder_backup exists, source is moved inside folder
shutil.move("myFolder_copy", "myFolder_backup")
print("Done.")

Python file handling best practices

Programmers follow various coding practices. Similarly, Python programmers also follow different coding practices when they handle files.

For example, some programmers use try-finally block and close file handlers manually. Some programmers let the garbage collector close the file handler by omitting the close method call — which is not a good practice. Meanwhile, other programmers use the with syntax to work with file handlers.

In this section, I will summarize some best practices for file handling in Python. First, look at the following code that follows file handling best practices.

def print_file_content(filename):
    with open(filename) as myFile:
        content = myFile.read()
        print(content)

file_to_read = "myFile.txt"

try:        
    print_file_content(file_to_read)
except:
    print("Unable to open file %s " % file_to_read)
else:
    print("Successfully print %s's content" % file_to_read)

Here, we used the with keyword to implicitly close the file handler. Also, we handle possible exceptions with a try-except block. While you are working with Python file handling, may sure that your code has the following points.

  • Never ignore exceptions   —  especially with long-running Python processes. However, it’s okay to ignore exceptions for simple utility scripts because unhandled exceptions halt utility scripts from continuing further
  • If you are not using the with syntax, make sure to close opened file handlers properly. The Python garbage collector will clean the unclosed file handlers, but it’s always good to close a file handler via our code to avoid unwanted resource usages
  • Make sure to unify file handling syntaxes in your codebase. For example, if you use with keyword for handling files, make sure to use the same syntax for all places where you are handling file
  • Avoid reopening the same file again when you read or write with multiple handlers. Instead, use the flush and seek methods, as shown below:
def process_file(filename):
    with open(filename, "w+") as myFile: 
    # w+: read/write and create if doesn't exist unlike r+
        # Write content
        myFile.write("Hello Python!")
        print("Cursor position: ", myFile.tell()) # 13
        # Reset internal buffer
        myFile.flush()
        # Set cursor to the beginning
        myFile.seek(0)
        print("Cursor position: ", myFile.tell()) # 0
        # Print new content
        content = myFile.read()
        print(content)
        print("Cursor position: ", myFile.tell()) # 13
file_to_read = "myFile.txt"
try:        
    process_file(file_to_read)
except:
    print("Unable to process file %s " % file_to_read)
else:
    print("Successfully processed %s" % file_to_read)

The above content saves a string to the file first. After that, it reads the newly added content again by resetting the internal buffer. The flush method clears the temporarily saved data in memory, so the next read will return the newly added content. Also, we need to use the seek(0) method call to reset the cursor to the beginning because the write method sets it to the end.

Conclusion

Python offers a simple syntax for programmers. Therefore, almost all file operations are easy to implement. But, Python has some issues with the standard library design, so there are multiple API functions to the same thing. Therefore, you have to select the most suitable standard module according to your requirement.

Also, Python is a slow language compared to other popular programming languages. With this in mind, make sure to optimize your Python script without using too many resources. For example, you can optimize performance by processing large text files line by line without processing the entire content at once.

In this tutorial, we discussed generic text file handling and binary file handling. If you need to process specific file formats, it may be worth it to select a better library or standard module. For example, you can use the csv standard module for processing CSV files and the PyPDF2 library for processing PDF files. Also, the pickle standard module helps you to store (and load) Python data objects with files.

Shalitha Suranga Programmer | Author of Neutralino.js and Jerverless

Leave a Reply