Azure Outage Post Mortem Report

It’s been a tough week for Microsoft, outage after outage hits it’s clous services results in global outage.

Start of this week , number of Microsoft customers worldwide were impacted by a cascading series of problems resulting in many being unable to access their Microsoft apps and services. Microsoft released a not for this outage.

Customers reported they can’t sign into Microsoft and third-party applications which used Azure Active Directory (Azure AD) for authentication. Microsoft acknowledge this issue is with SDP (Safe Deployment Program) mishaps

Azure AD is designed to be geo-distributed and deployed with multiple partitions across multiple data centers around the world, and is built with isolation boundaries. Microsoft normally applies changes across a validation ring that doesn’t include customer data, followed by four additional rings over the course of several days before they hit production. But this week the SDP didn’t correctly target the validation ring due to a defect and all rings were targeted concurrently causing service availability to degrade.

Microsoft engineering knew within five minutes of the problem that something was wrong. During the next 30 minutes, Microsoft started taking steps to expedite mitigation by scaling out some Azure AD services to handle the load once a mitigation would have been applied and failing over certain workloads into a backup Azure AD authentication system. But there roll back failed due to the corruption in the backup SDP metadata resulted in manual configuration

Microsoft fixed the latent code defect in the Azure AD backend SDP system; fixed the existing rollback system; and expanded the scope and frequency of rollback operation drills. The team still needs to apply more protections to the Azure AD SDP system to prevent these kinds of issues. It also needs to expedite the rollout of the Azure AD backup authentication system to all key services, and to onboard Azure AD scenarios to the automated communications pipeline .

Microsoft’s report also doesn’t mention that the past couple of days customers in various geographies have been reporting problems with Exchange Online and Outlook on their mobile devices. Microsoft attributed that problem to a situation involving Exchange ActiveSync and “a recent configuration update to components that route user requests was the cause of impact.”

On 1st October again an outage of cloud services has been noticed for s shorter period.

CVE 2020-1472 – Exploit goes wild

The CVE-2020-1472 flaw is an elevation of privilege that resides in the Netlogon. The Netlogon service is an Authentication Mechanism used in the Windows Client Authentication Architecture which verifies logon requests, and it registers, authenticates, and locates Domain Controllers.

“An elevation of privilege vulnerability exists when an attacker establishes a vulnerable Netlogon secure channel connection to a domain controller, using the Netlogon Remote Protocol (MS-NRPC).

An attacker who successfully exploited the vulnerability could run a specially crafted application on a device on the network.” reads the advisory published by Microsoft.

“To exploit the vulnerability, an unauthenticated attacker would be required to use MS-NRPC to connect to a domain controller to obtain domain administrator access.”

“By forging an authentication token for specific Netlogon functionality, he was able to call a function to set the computer password of the Domain Controller to a known value. After that, the attacker can use this new password to take control over the domain controller and steal credentials of a domain admin.”

“The vulnerability stems from a flaw in a cryptographic authentication scheme used by the Netlogon Remote Protocol, which among other things can be used to update computer passwords.”

An attacker could exploit the vulnerability to impersonate any computer, including the domain controller itself, and execute remote procedure calls on their behalf.

An attacker could also exploit the flaw to disable security features in the Netlogon authentication process and change a computer’s password on the domain controller’s Active Directory.

“By simply sending a number of Netlogon messages in which various fields are filled with zeroes, an attacker can change the computer password of the domain controller that is stored in the AD. This can then be used to obtain domain admin credentials and then restore the original DC password.”

“This attack has a huge impact: it basically allows any attacker on the local network to completely compromise the Windows domain. The attack is completely unauthenticated”

The ZeroLogon attack could be exploited by threat actors to deliver malware and ransomware on the target network.

The only limitation on how to carry out a Zerologon attack is that the attacker must have access to the target network.

Researchers released a Python script that uses the Impacket library to test vulnerability for the Zerologon exploit, it could be used by admins to determine if their domain controller is still vulnerable.

August 2020 Patch Tuesday security updates only temporarily address the vulnerability making Netlogon security features mandatory for the Netlogon authentication process. This has the severity score of 10

Active Directory ! Heart of business. Proper DR plan

Active directory as the name suggest, if business need to be active then active directory should be actively protected with proper care.

Business vitality depends on AD. each and every details from login info, Email info , relied strongly on AD. As so it’s vital we should maintain a proper hygiene way to secure it from external attacks, since we have a long history of foreign intrudes contaminating, encrpting and erasing info

As the gatekeeper to critical applications and data in 90% of organization’s worldwide, AD has become a prime target for widespread cyberattacks that have crippled businesses and wreaked havoc on governments and non-profit organization

If in case of a disaster happen there should be an escape route to restore it. Key considerations are elobarated

  • Minimize Active Directory’s attack surface: Lock down administrative access to the Active Directory service by implementing administrative tiering and secure administrative workstations, apply recommended policies and settings, and scan regularly for misconfigurations – accidental or malicious – that potentially expose your forest to abuse or attack.
  • Monitor Active Directory for signs of compromise and roll back unauthorized changes: Enable both basic and advanced auditing and periodically review key events via a centralized console. Monitor object and attribute changes at the directory level and changes shared across domain controllers.
  • Implement a scorched-earth recovery strategy in the event of a large-scale compromise: Widespread encryption of your network, including Active Directory, requires a solid, highly automated recovery strategy that includes offline backups for all your infrastructure components as well as the ability to restoring from backup s without reintroducing any malware that might be on them.