Getting to Grips with HA for IBM i

When it comes to high availability (HA) for IBM I, there are lots of options. There’s logical replication software like MIMIX, or a hardware-based solution like PowerHA? You could deploy to the cloud, or stay on premise? You may want to use remote journaling?

Before going any further, we should probably define High availability (HA). HA refers to the ability to keep applications running in the event of an outage. Whether planned or unplanned. An HA solution helps avoid downtime by building redundancy, detecting failure, and providing failover capabilities.

Most HA solutions for IBM i achieve this by using real-time replication methods that copy each individual transaction that hits the production database to a secondary database. In the event of an outage, the HA solution provides a “role swap” facility to redirect users to the new secondary copy.

HA is related to disaster recovery (DR), in that a properly implemented and executed HA setup could save you from needing to invoke your DR strategy in the event of an unplanned outage. But there are important differences between the two, the main one being that having an HA setup is optional, whereas having a DR strategy is an IT requirement. A HA system will also not protect you from corrupted data from human error, viruses, or ransomware, as the damage will quickly be propagated to the secondary machine.

In HA and DR, effectiveness of solutions are measured with recovery time objective (RTO, i.e. how long does it take to recover) and recovery point objective (RPO, i.e. how much data are you willing to lose). If your business simply cannot stand to lose access to core IBM i applications for a given length of time, whether it’s minutes or hours, then it has a low RPO, and HA could be for you.

HA Solutions

Broadly speaking, there are two main classes of HA software on IBM i: logical replication software and hardware-based solutions.

The vast majority of HA solutions use logical replication software to duplicate the transactions from one IBM i environment to another IBM i environment, which is almost always located on a separate physical IBM i server.

Logical replication-based HA setups traditionally involve at least two IBM i servers: one to serve as the primary box and another to serve as the backup. Bigger setups may have three or more IBM i servers in their HA setup, configured in various manners, while smaller setups may rent an LPAR on a cloud provider’s IBM i system to serve as the backup.

Hardware-based HA is a relative newcomer to the IBM i scene. The method used in hardware-based HA is different from logical replication, but is widely accepted as the mainstream method in the Windows and Linux worlds.

 

How Logical Replication Works

As we mentioned before, logical replication is the older and more widespread form of HA in the IBM i world. The technology and techniques used for logical replication have been honed over decades of real-world use across tens of thousands of installations, which has resulted in a rich ecosystem of logical replication software and service providers ready to address the HA needs of IBM i users around the world.

 

Nearly all of the logical replication solutions today use IBM’s remote journaling technology as the core data replication method under the covers, but there is one exception, which we’ll discuss later.

Logical replication is used to replicate changes made to data and objects stored in the IBM i server. When a database field is created, updated, changed, or deleted on the primary system, the change is written to the primary server’s local journal receiver. (This is one of the principal uses of journaling on the IBM i server; the other is for auditing.)

When a change happens in the local journal, it’s automatically replicated over the network link to the remote journal of the secondary server. Because IBM’s remote journaling technology runs underneath the operating system layer, it simplifies the development and maintenance of the logical replication solution that sits on top of it. It “just works,” and is considered to be bullet-proof.

Once the changes are present on the remote journal, it’s up to the third-party HA software to apply them to the secondary server. Most of the processing overhead in logical replication solutions is incurred on the secondary system, which many tout as a benefit.

Besides the remote journaling support, logical replication solutions work on top of the operating system, and is designed to ensure that applications are available, as opposed to protecting the entire system (as hardware-based HA does). The initial implementation of logical replication is relatively straightforward, which is a bonus compared to hardware-based HA.

Downsides of this approach primarily centre around the need for users to continually check to make sure that everything they need to recover is being journaled, and that the journal receivers are being applied in the right order.