The company had to grapple with suboptimal utilization of human and system resources, plus there was a high cost of downtime and other demerits as well. Hence the need to introduce a better system in its place arose.
The Implementation
Infosys innovatively conceptualized the model of "Once-a-Quarter" planned maintenance for all the devices, technologies and platforms in the enterprise. At the conceptualization stage it almost seemed impossible but the team achieved it with a 3-pronged approach. Firstly, they ‘Leveraged bleeding edge technology' which was already deployed. This included technologies like Boot from SAN, Storage based replication & snapshots, Highly optimized virtualized environment, Thin Provisioning, Online firmware upgrades and Zero downtime for load balanced devices.
Secondly, a vendor & partner alignment program was set in place with the likes of HP, Hitachi, SAP, Microsoft and VMware. They had commitment from all these vendors and partners to provide on-site support for the big-bang quarterly upgrades. Thirdly, strategizing, meticulous planning & flawless testing was planned out. They planned and scheduled comprehensively for the quarter & year ahead, clubbed multiple upgrades, staggered across 48 hours over the weekend and sequenced upgrades where common teams were involved. They also had comprehensive rigorous test plans post execution in production to ensure there were no surprises.
Challenges faced
They faced apprehension from business and vendors, especially for SAP modules. The very idea of upgrading firmware for Storage, Fabric, Hitachi NAS, Hitachi Content Platform, Tape Library, backup infra, Servers and many other product upgrades was difficult to convince business. There were technical challenges carrying out multiple upgrades in parallel on complex landscapes like SAP owing to tightly coupled solutions. This big-bang approach also posed challenges of managing incident resolution, rollback & root cause analysis.
The Impact
The model has been successfully executed for five consecutive quarters since its conceptualization in February 2012. As many as 100+ mission critical production systems have been upgraded at a go in a span of 36 hours with 100+ folks comprising admins, vendors and application teams. Significant reduction in planned downtimes - achieved nearly 83 % reduction. Approximately 370 hours of planned downtime reduced just to 60 hours. 100% reduction in vendor tickets due to version incompatibility in SAP environment - 40 critical high priority tickets reduced to 0. Significant reduction in admin efforts - 76% effort reduction achieved. Approximately 3500 admin hours reduced only to 800 hours. Significant reduction in testing efforts - 83% effort reduction achieved. Approximately 1900 hours of testing reduced only to 300 hours.