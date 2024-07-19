A Microsoft Windows outage has disrupted numerous computer systems worldwide. This significant event has led to sudden shutdowns or restarts for many users, primarily due to a CrowdStrike update. Microsoft's Service Health Status updates indicate that a configuration change in their Azure backend workloads caused connectivity failures, impacting various Microsoft 365 services.

Root Cause of the Outage

The preliminary root cause identified by Microsoft is a configuration change in a portion of their Azure backend workloads. This change caused interruptions between storage and compute resources, leading to connectivity failures that affected downstream Microsoft 365 services.

CrowdStrike has identified a content deployment related to this issue and has reverted the changes. According to Microsoft, they are rerouting impacted traffic to alternate systems to alleviate the impact and expect gradual relief as they continue to mitigate the issue.

Steps to Resolve the Issue

If you are facing shutdowns or restarts, follow these steps to resolve the issue:

Boot Windows into Safe Mode or the Windows Recovery Environment.

Navigate to the C:\Windows\System32\drivers\CrowdStrike directory.

Locate the file matching C-00000291.sys and delete it.*

Reboot your system normally.

Global Impact

The outage has had far-reaching consequences, affecting air travel, stock markets, and various sectors globally:

Air Travel: Airports, including those in India, reported significant disruptions, with airlines like IndiGo and Akasa Air facing major operational challenges.

Stock Markets: Brokerages and stock exchanges experienced interruptions, impacting trading activities.

Businesses: Various companies reliant on Microsoft 365 and Azure services faced operational delays.

Microsoft’s Response

Microsoft has been actively working to resolve the issue. In a detailed thread on X (formerly Twitter), Microsoft stated: "We're working on rerouting impacted traffic to alternate systems to alleviate the impact in a more expedient fashion. We still expect users will continue to see gradual relief as we continue to mitigate the issue."

Ongoing Improvements

Microsoft assures that services are seeing continuous improvements. They have implemented several mitigation actions and are observing positive trends in service availability. However, users might still experience residual impacts as full functionality is restored.

Kumar Ritesh, CEO & Founder, CYFIRMA, stated: "The massive outage in Microsoft systems caused by CrowdStrike updates was due to a compatibility issue between CrowdStrike's Falcon sensor and a Windows update. When the CrowdStrike sensor, a critical endpoint protection agent, was updated, it conflicted with changes introduced in the latest Windows update. Such incidents underscore the importance of rigorous compatibility testing between security solutions and operating system updates to prevent widespread disruptions. There are measures that can be put in place to avoid such disruptions. Before deploying any security update or software patch, create a testing environment that mirrors production systems. Test the update thoroughly in this environment to identify any compatibility issues or unexpected behavior. Avoid deploying updates across all systems simultaneously. Instead, roll them out gradually to a subset of machines. Monitor these systems closely for any adverse effects. If everything looks good, proceed with a wider rollout. Regularly back up critical systems so that in case an update causes problems like the current situation with Crowdstrike updates, you can restore the system to a previous state. Ensure backups are tested and reliable. Use patch management tools to automate the deployment of updates. These tools allow you to schedule updates, track their status, and roll back changes if needed. We would always encourage organizations to implement monitoring solutions that detect anomalies, performance issues, or unexpected behavior. And set up alerts to notify you immediately if any critical system experiences problems."