This article provides a working knowledge of the principles of chaos engineering, discusses its use in software development, and explores how its use may be extended to blockchain development.
The tutorial portion of this article demonstrates how to use the ChaosETH framework to leverage chaos engineering for the testing of Ethereum clients. This strategy can be helpful for identifying flaws (sometimes referred to as “dark debts”) in smart contracts before the contract is widely adopted by members of the network.
- What is chaos engineering?
- Why is chaos engineering useful in blockchain development?
- Why Ethereum clients?
- Implementing chaos testing for a full Ethereum client
- Tutorial: Chaos engineering experiment with a Go-Ethereum client
What is chaos engineering?
Chaos engineering is the practice of performing experiments on a distributed system in order to make it resilient and more fault tolerant to turbulent conditions that may occur in a production environment. The concept is easily traced back to Netflix, where a team led by Casey Rosenthal was placed in charge of testing software availability and system resilience.
Chaos engineering has five advanced principles to guide chaos engineers. Follow these principles to ensure you are practicing chaos engineering properly:
- Set a hypothesis that describes the steady-state behavior of the target system
- Consider real-world situations and events
- Execute experiments in the production environment to build confidence in that environment
- Automate experiments to run continuously because distributed systems are complex
- Minimize the blast radius to prevent experiments from affecting customers
As you can see, these principles are very different from traditional testing techniques.
Why is chaos engineering useful in blockchain development?
Blockchain technology is a subset of distributed ledger technology and is used to build distributed decentralized applications. This distributed status is achieved by creating a peer-to-peer network of nodes, which are actually computers. As a system becomes more widely adopted and connected to more computers, its complexity increases.
Cardano Quick News on Twitter: “If your email service has a bug, people will just complain, nothing serious.But if a blockchain has a bug, people may lose money.Now that’s serious.Safety and stability are THE most important things to a blockchain.And they are at the heart of Cardano.#Cardano $ada #ada / Twitter”
If your email service has a bug, people will just complain, nothing serious.But if a blockchain has a bug, people may lose money.Now that’s serious.Safety and stability are THE most important things to a blockchain.And they are at the heart of Cardano.#Cardano $ada #ada
Faults and weaknesses can occur on a blockchain via the clients as a result of overloaded operating systems, errors from memory management, or network partitions. Deploying an Ethereum client is only possible on an operating system that provides it with important resources.
Given that chaos engineering is well suited to distributed systems, it can be useful in ensuring the resilience of each participating client on a blockchain, such as the Ethereum network.
Here are some points to keep in mind when designing experiments to inject chaos engineering principles on a blockchain:
- Chaos engineering experiments should focus on the consensus mechanism, the network, storage layers, identification and authorization of participating nodes, smart contracts, on-chain interaction, and governance
- Experiments can be done on the development and testnets, but after this, they must be conducted in production
- Minimizing the blast radius is important when experiments are conducted in production, as these applications will involve money
- Knowledge of similar architectures and known vulnerabilities are expedient in causing chaos on a client application
Why Ethereum clients?
This article specifically covers incorporating chaos engineering into Ethereum client applications. However, it’s important to note that the concept of injecting chaos in Web3 applies to all decentralized applications of all blockchains.
Ethereum has become the operational backbone of major decentralized platforms and has:
- Higher adoption than other blockchains
- Very active developer communities
- An easily accessible production environment
- Greater simplicity compared to other blockchains
Implementing chaos testing for a full Ethereum client
Proper planning for chaos testing on a live Ethereum client should include the following:
- A thorough understanding of the architecture of the Ethereum client that will be tested
- Planning the system model to adopt
- Handling the following based on the adopted Ethereum client:
- Calls based on improper fallback settings from the client
- Incorrectly set timeouts
- Dependencies that are not resilient enough or that are deprecated
- Single points of failure
- Cascading failures
Tutorial: Chaos engineering experiment with a Go-Ethereum client
In this tutorial, we’ll demonstrate how to use ChaosETH, a new framework that measures how resilient an Ethereum client is in production, to execute chaos engineering experiments on a Go-Ethereum (Geth) client.
ChaosETH was created by Long Zhang and colleagues at KTH Royal Institute of Technology in Sweden. ChaosETH was designed to assess the resilience of Ethereum clients and thereby make the Ethereum blockchain more reliable. By way of operation, ChaosETH:
- Monitors Ethereum clients to determine their steady-state behavior
- Actively injects system call invocation errors in the clients
- Monitors the resulting behavior of the error injection
- Compares the resulting behavior to the steady-state behavior
- Produces a resilience report directly from production
Let’s get started!
Step 1: Create the development environment
Select a cloud service provider where you will host a virtual machine, or install and configure Docker. Create a virtual machine instance running Ubuntu as OS and open port number 30303. This is the default port that the Ethereum client listens to.
Step 2: Build and run the target Ethereum client
Next, grab the latest stable version of the Ethereum client. Let’s go with the Geth client.
Build the client by following the documentation’s provided installation steps. Chaos engineering requires some observability features, hence you’ll need to add options to activate monitoring features in Geth’s documentation support for metrics.
There are many ways to install the Geth client, depending on your operating system or tooling. In this article, we’ll use Docker, and we’ll run the command on a shell:
docker pull ethereum/client-go # and running it with:\\ docker run -it -p 30303:30303 ethereum/client-go
Step 3: Create a Docker container for observability
We’ll use InfluxDB alongside the Geth client to enable monitoring functionalities. Use the following command:
docker run -p 8086:8086 -d --name influxdb -v influxdb:/var/lib/influxdb influxdb:1:8
Now, configure the InfluxDB container by executing the following commands:
docker exec -it influxdb bash
Run this command inside the container:
Next, execute these commands in the InfluxDB shell:
CREATE DATABASE chaoseth CREATE RETENTION POLICY "rp_chaoseth" ON "chaoseth" DURATION 999d REPLICATION 1 DEFAULT CREATE USER geth WITH PASSWORD xxx WITH ALL PRIVILEGES
Now the container is ready. You can proceed to run the Geth client along with the observability metrics and other options. Geth provides more than 500 different metrics from which we can choose.
The client must be run by a root user, even when it is being restarted after previous experiments. Therefore,
sudo is necessary for the syscall monitoring and error injector.
The data directory must be specified as an option in the command, given the extra disc space of the instance. If this is not done, it will get persisted into the OS drive of the instance instead.
Consistent configurations are required from a client’s peers, so we’ll specify a target number of peers; we’ll use
50 since that is the default maximum number of peers for the Geth client.
The observability metrics are included for the application level monitoring.
Finally, you can make the Geth client run in the background to free up the terminal, and you can redirect the output to anywhere you like.
The resulting command will look like this:
sudo nohup ./geth --datadir=/data/eth-data \\ --maxpeers 50 \\ --metrics --metrics.expensive \\ --metrics.influxdb --metrics.influxdb.database DB_NAME --metrics.influxdb.username geth --metrics.influxdb.password DB_PASS \\ >> geth.log 2>&1 &
Step 4: Sync the client and observe the metrics
The entire synchronization process takes around three days and the status can be monitored on https://ethernodes.org/.
There is a
client_monitor.py script that, when deployed, observes the steady-state behavioral metrics of the client after the sync is completed. The following command will attach the client monitor to the process and also feed the metric data as an endpoint in Prometheus in port 8000:
nohup sudo ./client_monitor.py -p CLIENT_PID -m -i 15 --data-dir=CLIENT_DATA_DIR >/dev/null 2>&1 &
To scrape the metrics data from Prometheus, include the following script in your
scrape_configs: - job_name: 'client_monitoring' static_configs: - targets: ['172.17.0.1:8000']
Alternatively, you can visualize the data by creating a Grafana dashboard, like so:
./visualization/Grafana - Syscall Monitoring.json file.
The steady-state analysis in the original experiment shows the metrics of data captured during two different monitoring sessions.
Chaos engineering and blockchain technology are both relatively new, but their importance has been proven and validated by wide adoption.
In this article, we provided an overview of chaos engineering principles, introduced the ChaosEth framework, and showed how to leverage the ChaosETH framework for resilience testing of a GETH client.
Implementing chaos engineering on Ethereum clients is critical for identifying potential faults that may occur during the lifecycle of a DApp or smart contract.
LogRocket is like a DVR for web and mobile apps, recording everything that happens in your web app or site. Instead of guessing why problems happen, you can aggregate and report on key frontend performance metrics, replay user sessions along with application state, log network requests, and automatically surface all errors.
Modernize how you debug web and mobile apps — Start monitoring for free.