Employees are gradually returning to the office to work, even if it is only on a limited basis. As organizations prepare for the day when life will eventually go back to normal, it is important for IT pros to consider how the transition might affect their IT infrastructure and what they can do to prepare, even as the COVID-19 pandemic continues.
Do you know what impact this return to the office will have on your IT systems? One way to find out and become better prepared is to follow the steps outlined here in this pandemic return-to-work checklist for hyper-converged infrastructure.
In response to the coronavirus pandemic, some IT departments either shuttered or greatly reduced the workload of their hyper-converged systems due to lack of use and to conserve resources. Meanwhile, many others brought in new hyper-converged systems or redirected existing hyper-converged nodes to bear the brunt of supporting the new work-from-home requirements of the past year. They brought in new hyper-converged systems to, for example, serve as the main data center support for greatly expanded or newly implemented virtual desktop infrastructures.
For hyper-converged infrastructure (HCI) running mission-critical workloads, the performance impact might be negligible, assuming that those systems continued to operate at normal capacity while employees worked remotely. However, these hyper-converged systems might have suffered neglect due to having minimal on-site IT staff to maintain them for a prolonged period of time. Other HCI platforms, such as those used in branch offices or DevOps/testing environments, might have gone largely unused for several months.
Regardless of the situation, it is important to assess the state of your hyper-converged infrastructure and perform a series of preventive maintenance tasks in preparation for employees returning on site. Here’s what you should do.
Step 1. Check for hardware errors
The first thing you should do, especially if a hyper-converged system hasn’t been used for a while, is perform a hardware health check. Make sure the system is not producing any alerts related to failed hardware or impending hardware failures.
Step 2. Verify storage health
As a part of assessing the health of an HCI platform’s hardware, check to see that system storage is functioning properly. Specifically, you should look for things like RAID arrays that have already suffered a disk failure or mirror sets that have fallen out of sync.
Step 3. Verify storage utilization
It is also important to check your hyper-converged infrastructure’s storage as part of your pandemic return-to-work checklist. Doing so will help ensure none of your volumes are running low on space. Be sure to check system volumes and volumes dedicated to applications, data or virtual machines.
Additionally, when verifying storage utilization, if you haven’t had the opportunity to monitor storage utilization for a while, it would be a good idea to check that the storage consumption rate is still in line with your pre-pandemic capacity planning estimates.
A hyper-converged system that has seen little use for a few months or more will likely show a storage consumption rate that is well below the initial estimate. For some systems, however, the fact that nearly everything is being done online now (as opposed to in person) might actually result in far greater storage consumption. At any rate, find out how quickly your available storage is being consumed.
Step 4. Patch management
Another key task in the HCI inspection is to check if your hyper-converged nodes have the proper patches installed. This obviously pertains to OS patches, but there are some other things to check. For instance, make sure your HCI nodes have the latest hardware drivers installed. If your HCI vendor has released any new firmware updates, apply those as well (both for compute and storage hardware).
Step 5. Software updates
Assessing the health of a hyper-convergence platform also means confirming you have installed the latest versions of any health or management software running on the platform. This includes things like native monitoring software, monitoring agents, antivirus software, backup agents and any security software that you run on the HCI nodes.
Step 6. Perform a security audit
Take the time to assess the platform’s security. Parse the event logs to look for items like account creations, attempted logins (especially into privileged accounts) or unauthorized configuration changes.
Step 7. Perform a stress test
If a hyper-converged infrastructure platform has been idle (or powered off) for a long time, it would be a good idea to perform a hardware stress test just to ensure the hardware is ready for a return to active duty. A stress test involves using a tool (there are countless free options available) to subject your hyper-converged hardware to a heavy load. Doing so will validate that the hardware performs as expected. While it is rare for major components to fail as a result of inactivity, fans, for example, do sometimes stop working. As such, it is a good idea to monitor the servers during the stress test to make sure that they do not overheat.