This article highlights the importance of monitoring and maintaining and testing your HA replication solution for IBM i.
Firstly, if you are using a High Availability Solution to replicate your IBM i production environment to a secondary system, congratulations on taking an important step move to protect your business. That said, how confident are you that your HA solution will deliver when you need it most?
- Are you 100% confident that you will be able to fail over to your secondary system when you need to?
- Do you have legal or regulatory requirements that require periodic validation of your HA/DR capability?
- Are your staff practiced in procedures to switch over to your backup system?
- Do you check the ‘health’ of your replication process to make sure there are no problems and that everything you need is being replicated?
- Are your staff sufficiently trained and experienced to detect and resolve issues before they cause a problem with a failover or planned switch?
- Do you upgrade and patch your MIMIX software regularly and update your MIMIX configuration when you upgrade your applications?
MIMIX Health Check Services from Mynah Bird IT can help close these gaps so you can be confident that your HA/DR strategy will work as expected. The service provides HA users with an expert MIMIX Consultant who will help you detect and correct any issues with HA replication between the production and the target environment, so that you are always ready to switch. Your consultant can also assist with updates, configuration support for changes to your application software and annual support for your staff to test and validate your fail-over plan.
With any HA solution, things can go wrong. While several HA solutions offer built-in auto-heal features to correct issues that may occur, not all can be fixed without some manual intervention. Some issues can create a snowball effect and a small problem now can create bigger downstream problems later. The earlier you find and fix these errors, the better. If you monitor your HA solution regularly, identify any issue and fix them immediately, you will always maintain an accurate mirror copy of the production system on the backup system.
What Can Happen When You Don’t Maintain Your HA Solution?
Imagine a catastrophic server failure that takes IBM three days to fix. HA replication is configured but has not been properly maintained for some time. When you switch over to the hot standby server, you find critical data and other objects that are missing, or out-of-sync, with the production server. This situation can happen for a multitude of reasons including: staff turnover, skills deficits, oversights due to heavy staff workloads/special projects, holidays etc.
Annual HA Testing/DR Validation Options
Depending on your HA solution you may carry out 3 testing options:
1. Virtual Switch – No Downtime. Clients test the target environment while production is still active.
2. Limited Live Switch – Switch over to the target for 4-8 hours to validate the failover process & test critical communications processes.
3. Extended Live Switch – 2-8 Hr. Move production operations to the target system for an extended period of time until a Switch Back is executed.
Testing is the best way to get peace of mind in relation to your HA solution.
Ultimately, if your team is not properly trained to monitor and remediate your HA solution you should either, get the necessary training and schedule in-house monitoring and manage your HA solution properly or find a competent service you can rely on to do it for you.
For details of a first class, reliable MIMIX health check and testing service, please feel free to contact Mynah Bird IT.