Kingsley Ubah 21. Web Developer. Technical Writer. African in Tech.

Understanding Python dataclasses

6 min read 1719

The Python 3.7 release saw a new feature introduced: dataclasses.

For reference, a class is basically a blueprint for creating objects. An example of a class could be a country, which we would use the Country class to create various instances, such as Monaco and Gambia.

When initializing values, the properties supplied to the constructor (like population, languages, and so on) are copied into each object instance:

class Country:
    def __init__(self, name: str, population: int, continent: str, official_lang: str):
        self.name = name
        self.population = population
        self.continent = continent
        self.official_lang = official_lang


smallestEurope = Country("Monaco", 37623, "Europe")
smallestAsia= Country("Maldives", 552595, "Asia")
smallestAfrica= Country("Gambia", 2521126, "Africa") 

If you ever worked with object-oriented programming (OOP) in programming languages like Java and Python, then you should already be familiar with classes.

A dataclass, however, comes with the basic class functionalities already implemented, decreasing the time spent writing code.

In this article, we’ll delve further into what dataclasses in Python are, how to manipulate object fields, how to sort and compare dataclasses, and more.

Note that because this was released in Python 3.7, you must have a recent version of Python installed on your local machine to use it.

What is a Python dataclass?

As mentioned previously, Python dataclasses are very similar to normal classes, but with implemented class functionalities that significantly decrease the amount of boilerplate code required to write.

An example of such boilerplate is the __init__ method.

In the Country class example, you can observe that we had to manually define the __init__ method, which gets called when you initialize the class. Now, for every normal class you define, you are required to provide this function, which means you must write a lot of repetitive code.

The Python dataclass comes with this method already defined. So, you can write the same Country class without manually defining a constructor.

Under the hood, @dataclass calls this method when you initialize the object with new properties.



Note that __init__ is not the only method provided by default. Other utility methods like __repr__ (representation), __lt__ (less than), __gt__ (greater than), __eq__ (equal to), and many others are also implemented by default.

Using the normal Python class

When working with a normal class in Python, we have longer code to implement the base methods.

Consider the Country class again. In the code block below, you can see a couple of methods, starting with the __innit__ method. This method initializes attributes like the country name, population count, continent, and official language on a Country instance.

__repr__ returns the string representation of a class instance. This prints the attributes of each class instance in a string form.

_lt_ compares the population of two Country instances and returns True if the present instance has a lesser population, while _eq_ returns True if they both have the same population count:

class Country:
    def __init__(self, name: str, population: int, continent: str, official_lang: str="English" ):
        self.name = name
        self.population = population
        self.continent = continent
        self.official_lang= official_lang

   def __repr__(self):
        return(f"Country(name={self.name},
            population={self.population}, continent={self.continent},
            official_lang={self.official_lang})")

   def __lt__(self, other):
        return self.population < other.population

   def __eq__(self, other):
        return self.population == other.population


smallestAfrica= Country("Gambia", 2521126, "Africa", "English")
smallestEurope = Country("Monaco", 37623, "Europe", "French")
smallestAsia1= Country("Maldives", 552595, "Asia", "Dhivehi")
smallestAsia2= Country("Maldives", 552595, "Asia", "Dhivehi")


print(smallestAfrica) 
# Country(name='Gambia', population=2521126, continent='Africa', #official_lang='English')

print(smallestAsia < smallestAfrica) # True
print(smallestAsia > smallestAfrica) # False

Using the Python dataclass

To use Python’s dataclass in your code, simply import the module and register the @dataclass decorator on top of the class. This injects the base class functionalities into our class automatically.

In the following example, we’ll create the same Country class, but with far less code:

from dataclasses import dataclass

@dataclass(order=True)
class Country:
     name: str
     population: int
     continent: str
     official_lang: str

smallestAfrica= Country("Gambia", 2521126, "Africa", "English")
smallestEurope = Country("Monaco", 37623, "Europe", "French")
smallestAsia1= Country("Maldives", 552595, "Asia", "Dhivehi")
smallestAsia2= Country("Maldives", 552595, "Asia", "Dhivehi")

# Country(name='Gambia', population=2521126, continent='Africa', #official_lang='English')

print(smallestAsia1 == smallestAsia2) # True
print(smallestAsia < smallestAfrica) # False

Observe that we didn’t define a constructor method on the dataclass; we just defined the fields.

We also omitted helpers like repr and __eq__. Despite the omission of these methods, the class still runs normally.

Note that for less than (<), dataclass uses the default method for comparing objects. Later on in this article, we will learn how to customize object comparison for better results.

Manipulating object fields using the field() function

The dataclass module also provides a function called field(). This function gives you ingrained control over the class fields, allowing you to manipulate and customize them as you wish.

For example, we can exclude the continent field when calling the representation method by passing it a repr parameter and setting the value to false:

from dataclasses import dataclass, field

@dataclass
class Country:
     name: str
     population: int
     continent: str = field(repr=False) # omits the field
     official_lang: str

smallestEurope = Country("Monaco", 37623, "Europe", "French")

print(smallestEurope)

# Country(name='Monaco', population=37623, official_lang='French') 

This code then outputs in the CLI:

Output In CLI That Shows All Country Details But Omits The Continent Field

By default, repr is always set to True

Here are some other parameters that can be taken in by field().

init parameter

The init parameter passes to specify whether an attribute should be included as an argument to the constructor during initialization. If you set a field to innit=False, then you must omit the attribute during initialization. Otherwise, a TypeError will be thrown:

from dataclasses import dataclass, field

@dataclass
class Country:
     name: str
     population: int  
     continent: str
     official_lang: str = field(init=False) #Do not pass in this attribute in the constructor argument  


smallestEurope = Country("Monaco", 37623, "Europe", "English") #But you did, so error!

print(smallestEurope)

This code then outputs in the CLI:

init Parameter Rendering In The CLI

default parameter

The default parameter is passed to specify a default value for a field in case a value is not provided during initialization:

from dataclasses import dataclass, field

@dataclass
class Country:
     name: str
     population: int  
     continent: str
     official_lang: str = field(default="English") # If you ommit value, English will be used


smallestEurope = Country("Monaco", 37623, "Europe") #Omitted, so English is used

print(smallestEurope)

This code then outputs in the CLI:

default Parameter Rendered in CLI To Specify A Default Value

repr parameter

The repr parameter passes to specify if the field should be included (repr=True) or excluded (repr=False) from the string representation, as generated by the __repr__ method:

from dataclasses import dataclass, field

@dataclass
class Country:
     name: str
     population: int  
     continent: str
     official_lang: str = field(repr=False) # This field will be excluded from string representation


smallestEurope = Country("Monaco", 37623, "Europe", "French") 

print(smallestEurope)

This code then outputs in the CLI:

repr Parameter Rendered In CLI To Included (repr=True) Or Excluded (repr=False) From The String Representation

Modifying fields after initialization with __post_init__

The __post_init__ method is called just after initialization. In other words, it is called after the object receives values for its fields, such as name, continent, population, and official_lang.

For example, we will use the method to determine if we are going to migrate to a country or not, based on the country’s official language:

from dataclasses import dataclass, field

@dataclass
class Country:
     name: str
     population: int
     continent: str = field(repr=False) # Excludes the continent field from string representation
     will_migrate: bool = field(init=False) # Initialize without will_migrate attribute
     official_lang: str = field(default="English") # Sets default language. Attributes with default values must appear last


     def __post_init__(self):
           if self.official_lang == "English":
                 self.will_migrate == True
           else:
                 self.will_migrate == False 

After the object initializes with values, we perform a check to see if the official_lang field is set to English from inside post_init. If so, we must set the will_migrate property to true. Otherwise, we set it to false.

Sort and compare dataclasses with sort_index

Another functionality of dataclasses is the ability to create a custom order for comparing objects and sorting lists of objects.

For example, we can compare two countries by their population numbers. In other words, we want to say that one country is greater than another country if, and only if, its population count is greater than the other:

from dataclasses import dataclass, field

@dataclass(order=True)
class Country:
     sort_index: int = field(init=False)
     name: str
     population: int = field(repr=True)
     continent: str 
     official_lang: str = field(default="English") #Sets default value for official language



     def __post_init__(self):
           self.sort_index = self.population

smallestEurope = Country("Monaco", 37623, "Europe")
smallestAsia= Country("Maldives", 552595, "Asia")
smallestAfrica= Country("Gambia", 2521126, "Africa") 

print(smallestAsia < smallestAfrica) # True
print(smallestAsia > smallestAfrica) # False

To enable comparison and sorting in a Python dataclass, you must pass the order property to @dataclass with the true value. This enables the default comparison functionality.

Since we want to compare by population count, we must pass the population field to the sort_index property after initialization from inside the __post_innit__ method.

You can also sort a list of objects using a particular field as the sort_index. For example, we must sort a list of countries by their population count:

from dataclasses import dataclass, field

@dataclass(order=True)
class Country:
     sort_index: int = field(init=False)
     name: str
     population: int = field(repr=True)
     continent: str 
     official_lang: str = field(default="English")



     def __post_init__(self):
           self.sort_index = self.population



europe = Country("Monaco", 37623, "Europe", "French")
asia = Country("Maldives", 552595, "Asia", "Dhivehi")
africa = Country("Gambia", 2521126, "Africa", "English")
sAmerica = Country("Suriname", 539000, "South America", "Dutch")
nAmerica = Country("St Kits and Nevis", 55345, "North America", "English")
oceania = Country("Nauru", 11000, "Oceania", "Nauruan")  

mylist = [europe, asia, africa, sAmerica, nAmerica, oceania]
mylist.sort()

print(mylist) # This will return a list of countries sorted by population count, as shown below

This code then outputs in the CLI:

Using sort_index To Sort List Of Objects

Don’t want the dataclass to be tampered with? You can freeze the class by simply passing a frozen=True value to the decorator:

from dataclasses import dataclass, field

@dataclass(order=True, frozen=True)
class Country:
     sort_index: int = field(init=False)
     name: str
     population: int = field(repr=True)
     continent: str 
     official_lang: str = field(default="English")



     def __post_init__(self):
           self.sort_index = self.population

Wrapping up

A Python dataclass is a very powerful feature that drastically reduces the amount of code in class definitions. The module provides most of the basic class methods already implemented. You can customize the fields in a dataclass and restrict certain actions.

Get setup with LogRocket's modern error tracking in minutes:

  1. Visit https://logrocket.com/signup/ to get an app ID.
  2. Install LogRocket via NPM or script tag. LogRocket.init() must be called client-side, not server-side.
  3. $ npm i --save logrocket 

    // Code:

    import LogRocket from 'logrocket';
    LogRocket.init('app/id');
    Add to your HTML:

    <script src="https://cdn.lr-ingest.com/LogRocket.min.js"></script>
    <script>window.LogRocket && window.LogRocket.init('app/id');</script>
  4. (Optional) Install plugins for deeper integrations with your stack:
    • Redux middleware
    • ngrx middleware
    • Vuex plugin
Get started now
Kingsley Ubah 21. Web Developer. Technical Writer. African in Tech.

Leave a Reply