Table of Contents

Reach SOC 2 Compliance in 6 Weeks or Less.

  / ,

  / When the Cloud Goes Dark: Regional Outages and What They Mean for SOC 2 and ISO 27001 Compliance

When the Cloud Goes Dark: Regional Outages and What They Mean for SOC 2 and ISO 27001 Compliance

In March 2026, a regional conflict in the Middle East did something that stress tests and tabletop exercises rarely manage to do: it took down cloud infrastructure across multiple availability zones at the same time, in the same region, without warning.

AWS data centers in the UAE and Bahrain were impacted. Banking apps went offline. Payments failed. Delivery platforms stopped. And a significant portion of the affected organizations had done everything “right” by conventional standards — multi-AZ deployments, redundancy within the region, documented continuity plans.

It wasn’t enough.

This article breaks down what happened, what it revealed about how most organizations think about availability, and what a more resilient architecture actually looks like. If your systems run on cloud infrastructure — in any region — this case is worth understanding closely.

What Happened: The March 2026 Incident

Regional conflict in the Middle East caused physical and infrastructural disruption to AWS facilities across the UAE and Bahrain. Based on publicly reported information, the incident involved power outages affecting data center operations, physical damage to infrastructure facilities, connectivity loss across affected environments, and service degradation spanning multiple availability zones within the same region — simultaneously.

That last point is the one that matters most. AWS designs its availability zones to be isolated from one another — separate power, cooling, and networking — so that a failure in one zone doesn’t cascade into another. Under normal failure conditions, that isolation holds. But this wasn’t a normal failure condition. It was a regional-scale disruption. The “rooms” were fine. The “building” was the problem.

“Availability zones are designed to handle localized failures, not regional ones. This incident sits firmly in the second category.”

The result was that organizations with multi-AZ architectures — which many rightly considered robust — still went down. There was no in-region fallback left to use.

Business Impact: What Actually Went Offline

The impact was not subtle. Banking platforms experienced downtime that prevented customers from accessing accounts or completing transactions. Payment processors were unable to process transactions. Mobility and delivery platforms halted operations entirely. Customer-facing applications became unavailable across the board.

This wasn’t degraded performance or slower load times. It was a full loss of availability for any system that lived entirely within the affected region. The AWS Well-Architected Framework acknowledges that regional failures, while rare, are a defined risk category — and designing for them requires a fundamentally different approach than designing for AZ failures.

Organizations with multi-region architectures kept operating. Everything else stopped. That single architectural decision — single-region versus multi-region — was the difference between availability and a complete outage.

What Risks Actually Materialised

This incident didn’t create new risks. It exposed ones that were already there, quietly embedded in architectural choices and compliance assumptions that had never been stress-tested at this scale.

Regional Single Point of Failure

The most common pattern among affected organizations: applications, databases, and backups all deployed within a single region. When that region became unavailable, there was no secondary environment to take over. No warm standby, no traffic rerouting, no automated failover. Just downtime.

This is the architectural equivalent of backing up your data to a drive sitting next to your laptop. It works until it doesn’t.

The Limits of Availability Zone Redundancy

Availability zones are a powerful tool — but they’re a tool designed for a specific class of failure, and understanding that class matters. Think of an availability zone as a separate floor in a building. If one floor has a problem, you move to another floor. But if the entire building loses power — or becomes inaccessible — floor redundancy doesn’t help. You needed another building entirely. That’s what a region is. And this incident took down the building.

Pro tip: When mapping your architecture against a business continuity plan, explicitly define your regional failure scenario. “What happens if this entire region becomes inaccessible for 24 hours?” is a question that exposes gaps that AZ-level planning will never catch.

Infrastructure-Level Disruption Is Not Solvable at the Application Layer

Power outages. Connectivity loss. Physical damage. These are not conditions that clever application architecture can work around if your infrastructure is entirely contained within the affected geography. No amount of microservices design, caching strategy, or auto-scaling helps when there’s no power reaching the data center.

This is an important framing shift for engineering teams who own availability: some failure modes require infrastructure-layer responses, not code-layer ones.

The Compliance Gap: Controls on Paper vs. Controls in Practice

Perhaps the most uncomfortable implication of this incident. In many environments — particularly those undergoing ISO/IEC 27001:2022 certification or SOC 2 audits — availability controls are documented but don’t reflect the actual system architecture. Redundancy is listed as a control. It’s just redundancy within a single region, which, as this event demonstrated, is insufficient for regional-scale disruptions. The control passes an audit. It fails a real incident.

This is the exact gap that compliance frameworks are designed to close — and that audit processes sometimes fail to catch.

Reach SOC 2 Compliance in 6 Weeks or Less

Schedule Your Free SOC 2 Assessment Today

Cloud Hosting and SOC 2 Compliance Requirements

Choosing AWS or Azure doesn’t hand you a SOC 2 compliance. It hands you a shared responsibility model, which means your provider secures the physical infrastructure and you secure everything running on top of it — including whether your architecture can actually deliver on your availability commitments.

Auditors know this distinction well. When they evaluate your Availability criteria, they’re looking at your controls, not your provider’s SOC 2 report.

What that means in practice: your recovery objectives need to be real numbers tied to a real architecture, not placeholders in a policy document. Your failover plan needs test records behind it. And your cloud provider should appear in your vendor risk register with an annual review of their own audit reports.

A single-region deployment with no tested failover isn’t compliant in any meaningful sense. It’s a documentation exercise waiting to be disproved.

The March 2026 incident made this concrete. Organizations that had documented availability controls but confined their entire infrastructure to one region found those controls counted for nothing when the region went down. The control passed the audit. It failed the incident.

That gap is exactly what a SOC 2 audit is supposed to catch. Sometimes it doesn’t. 

What Mitigating Controls Could Have Reduced the Impact

The following aren’t theoretical best practices. They’re the specific capabilities that separated organizations that stayed online from those that didn’t.

Multi-region deployment is the foundational requirement. Deploying systems across independent geographic regions — not just independent availability zones — means a regional disruption in one location doesn’t take everything down. Google Cloud’s documentation on multi-region architectures provides useful reference material on how this is structured in practice.

Cross-region data replication ensures that when failover happens, the secondary region has current data to work with. Replication lag is a design variable — it can be tuned based on acceptable recovery point objectives. What can’t be tuned is the existence of the replication relationship itself. If it isn’t there before the incident, it can’t help during one.

Automated failover removes the human response time variable from the equation. If traffic rerouting to a secondary region requires manual intervention, you are adding minutes or hours to your outage window during the exact moment when your team is most overwhelmed. Route 53 failover routing, Azure Traffic Manager, and equivalent tools in other clouds exist specifically for this scenario.

Regional outage testing is the practice that most organizations skip. Simulating a full regional failure — not just a single AZ — validates whether recovery strategies actually work, not just whether they exist. The NIST SP 800-34 guide on contingency planning recommends testing at the scenario level, not just the control level.

Dependency resilience is the one that catches teams off guard. If your identity provider, monitoring stack, or secrets management system lives in the same region as your primary workload, your failover may not actually work — because the systems your application depends on to function are also offline.

Insider note: A common failure in multi-region DR testing is discovering that the authentication service doesn’t fail over cleanly, even when the application does. Audit your dependency chain before you test — not during.

Compliance Perspective: What the Frameworks Actually Require

This incident maps cleanly onto requirements that many organizations are already accountable for.

ISO/IEC 27001:2022 addresses this directly across several controls. A.8.14 covers redundancy of information processing facilities — and the intent is effective redundancy, not documented redundancy. A.8.13 covers backup, with an expectation that backup data is accessible when primary systems are not. A.5.30 addresses ICT readiness for business continuity, which includes planning for scenarios beyond localized failure. The standard is explicit that controls must be implemented in a way that is proportionate to the risk — and a single-region deployment for a mission-critical application is a risk the standard expects to be addressed.

Unsure whether your current architecture actually satisfies these controls? An ISO 27001 gap analysis is usually the fastest way to find out, and an internal audit against your documented controls will surface the delta between what’s on paper and what’s in production.

SOC 2 Availability Criteria requires that systems are available in line with commitments and expectations. If your service-level commitments assume high availability, and your architecture cannot deliver that when a region goes offline, you have a gap between your commitments and your design. The AICPA’s Trust Services Criteria are clear on this point: availability controls must reflect real-world capability, not aspirational architecture.

The common thread across both frameworks: compliance asks whether controls are effective, not just whether they’re present. This incident is a clear case study in what ineffective-but-documented redundancy looks like under real conditions.

What This Means for Your Organization

The March 2026 incident is not a cautionary tale about a distant edge case. It’s a practical reference point for evaluating your own architecture — right now, before you need it.

The questions worth asking are direct ones. Can your systems operate if an entire cloud region becomes unavailable — not for five minutes, but for hours? Does your failover extend beyond a single region, or does it just move traffic between availability zones? Have you tested a full regional failure scenario, or only component-level failures? Do your compliance controls reflect actual system architecture, or how it was originally designed two years ago?

If the answer to any of those is uncertain, that uncertainty is the finding.

NIST‘s Cybersecurity Framework is also worth revisiting in this context — specifically the “Recover” function, which provides a structured way to think about resilience planning at the organizational level, not just the infrastructure level.

Conclusion

The March 2026 incident made one thing concrete: availability is not defined by the presence of redundancy within a region — it’s defined by the ability to operate beyond it.

Multi-AZ architecture is good design. It protects against the failures it’s designed to protect against. But it was never intended to be a substitute for multi-region resilience, and organizations that treated it as one found out the hard way. For most organizations, closing this gap doesn’t require rebuilding from scratch. It requires an honest assessment of where your architecture actually stands versus where you assumed it did.

Axipro works with scaling software companies to assess availability architecture, close compliance gaps, and ensure that continuity controls hold up under real-world conditions — not just audit conditions. If the questions raised in this article surfaced something worth investigating in your own environment, reach out to our team to schedule a technical review. Or if you’d prefer to start with a self-assessment, learn more about how we approach availability and compliance readiness.

Reach SOC 2 Compliance in 6 Weeks or Less

Schedule Your Free SOC 2 Assessment Today

Axipro Author

Picture of Abeera Zainab

Abeera Zainab

Blog Highlights

Explore More Articles

Defense contractors handling Controlled Unclassified Information now face a choice that shapes their entire compliance budget: lock down the whole organization, or draw a tight boundary around CUI and protect only that. The second path is kown as the CMMC enclave. For many companies in the Defense Industrial Base, it is the faster, more affordable, and more operationally sensible route to certification, but only if it is scoped and implemented correctly. This article explains what a CMMC enclave is, how it differs from enterprise-wide compliance, and what it takes to build one that will actually hold up under assessment. What Is a CMMC Enclave? A CMMC enclave is a logically or physically isolated segment of your IT environment where all CUI is processed, stored, and transmitted. Everything inside the enclave boundary is in scope for a CMMC assessment. Everything outside is not. Think of your company as a building. The enclave is a locked, monitored room inside it. Only specific people are authorized to enter, all activity within the room is logged, and the security controls governing the room are documented and continuously enforced. The rest of the building operates normally, unaffected by the rigorous controls applied inside. The concept is explicitly supported by DoD guidance. The CMMC Level 2 Scoping Guide states that organizations “may limit the scope of the security requirements by isolating the designated system components in a separate CUI security domain.” That isolation can be achieved through physical separation, logical separation, or a combination of both. How a CMMC Enclave Differs from Enterprise-Wide Compliance Enterprise-wide compliance means applying all 110 NIST SP 800-171 controls across your entire organization: every endpoint, every user account, every application that touches any part of your network. That is the default interpretation many contractors start with, and it is expensive. A larger scope means more assets to harden, more users to train, more systems to document, and a bigger, more complex assessment. An enclave approach inverts the logic. Instead of bringing the whole organization up to CMMC Level 2 standards, you identify the minimum set of systems and users that genuinely need to touch CUI — and you apply full controls to only that subset. The result is a smaller, focused compliance footprint. The financial difference is real. Published case studies show that well-scoped enclaves reduce CMMC implementation costs by 20 to 45 percent compared to enterprise-wide approaches. A 40-person manufacturer, for example, reduced its projected CMMC implementation cost from $140,000 to $78,000 by migrating CUI into a cloud-based enclave. The savings compound: fewer assets to secure, fewer people to train, a smaller assessment scope, and lower ongoing maintenance costs year after year. Physical Separation vs. Logical Separation in a CMMC Enclave The DoD’s own scoping guidance is clear that security domains may use physical separation, logical separation, or a combination of both. Understanding the difference matters because your choice affects architecture, cost, and how an assessor will evaluate your boundary. Physical separation means CUI assets live on dedicated hardware, in a separate room or cage, disconnected from general-purpose networks at the cable level. It is the most defensible form of separation, but it also carries higher hardware costs and operational overhead. For some regulated environments — particularly those subject to Level 3 requirements or handling the most sensitive categories of CUI — physical separation may be necessary. Logical separation uses network segmentation, firewall rules, VLANs, and access controls to isolate CUI assets within a shared physical infrastructure. It is cheaper, faster to implement, and the more common approach for CMMC Level 2 enclaves — but it requires architectural rigor. A VLAN boundary that is not technically enforced, or a firewall rule that permits general IT traffic to reach CUI systems, will not hold up during assessment. A critical point the DoD has reinforced in its updated FAQ guidance: logical separation must be provable and documented. Saying you have logical separation is not enough. You need enforceable architecture, tested configurations, and the documentation to demonstrate both. Important: A common mistake is treating logical separation as a policy statement rather than an architectural fact. Assessors will test your boundary controls, not just read your System Security Plan. If traffic can flow between your corporate network and your CUI enclave — even indirectly — the enterprise network may be pulled into scope. Why CMMC Scoping Matters Before Choosing an Enclave Approach Scoping is the decision that determines everything downstream: which systems you secure, which employees you train, how much the assessment costs, and how confident you can be that you will pass. Getting it wrong in either direction creates problems. Over-scoping wastes money. If your compliance boundary includes systems that never touch CUI, you are paying to harden infrastructure that does not need it. Under-scoping is worse: if CUI flows through systems outside your declared enclave — shared email servers, unmanaged endpoints, a consumer file-sharing tool someone uses informally — your boundary is invalid and your assessment will fail. NIST SP 800-171 offers a useful framing: organizations “will not want to spend money on cybersecurity beyond what it requires for protecting its missions, operations, and assets.” Scoping is how you align security investment with actual risk. Every asset you can legitimately keep out of scope is a saving. How to Scope a CMMC Enclave Scoping starts with a single question: where does CUI actually go in your environment? The answer is usually more distributed than people expect. CUI flows through email. It lands in shared drives, project management tools, collaboration platforms, and sometimes personal devices. Before you can define an enclave, you need to map all of it. The DoD scoping process works through asset categories: CUI Assets (systems that directly process, store, or transmit CUI), Security Protection Assets (systems that enforce security functions for CUI assets), Contractor Risk Managed Assets, Specialized Assets (IoT, OT, test equipment), and Out-of-Scope Assets. Only Out-of-Scope Assets can be excluded from assessment — and to qualify, they must be provably isolated from CUI flows. The key

A well-built SOC 2 runbook is the difference between a finding and a clean opinion. It converts the abstract language of a control into a sequence of actions someone actually performed, in a verifiable order, with a paper trail attached. Auditors do not fail companies for having incidents. They fail them for not being able to prove how those incidents were handled. This guide shows you how to build a runbook that holds up under scrutiny — covering what a SOC 2 runbook is, what makes it audit-ready, how it differs from a playbook, the components every runbook should include, the control areas where runbooks are expected, and how to keep them current between annual examinations. What Is a SOC 2 Runbook? A SOC 2 runbook is a documented, repeatable procedure that operationalises a specific SOC 2 control. Where a policy states what must happen and why, a runbook states exactly how: the trigger, the steps, the people, the systems touched, the evidence captured, and the sign-off that closes it out. Runbooks live closest to the engineers and operations staff actually doing the work. They are the layer auditors care about most because they are where the control either operates or fails. A well-written runbook turns a control objective into something testable, traceable, and survivable across staff turnover. SOC 2 Runbook vs. SOC 2 Playbook: Key Differences The terms get used interchangeably, but they describe two different artefacts. The cleanest distinction is scope and audience. Dimension Runbook Playbook Scope One specific procedure Multi-step strategy across functions Audience Engineers, on-call responders, operations teams Leadership, legal, communications, incident response coordinators Detail Level Commands, queries, exact tooling Decisions, escalation paths, stakeholder roles Example Isolating an affected EC2 instance using a documented AWS CLI command Coordinating a ransomware response across legal, PR, and law enforcement Length Short, tactical, and scannable Longer, narrative, and decision-oriented A mature SOC 2 programme uses both. The playbook frames the response. The runbook executes pieces of it. Why SOC 2 Auditors Expect Runbooks The AICPA’s Trust Services Criteria describe what auditors test, but at the level of objectives, not procedures. CC7.3 says you must respond to security incidents. It does not tell you how. The runbook is your answer to how. Auditors are looking for two things when they evaluate a control: that it was designed appropriately, and that it operated effectively across the audit period. Runbooks are how you show both. The document itself is the design. The completed runbook artefacts (tickets, logs, sign-offs, post-mortems) are the operating evidence. Which SOC 2 Trust Services Criteria Require Runbook Documentation Every Common Criteria area benefits from runbooks, but the strongest expectation sits in CC6 (logical and physical access), CC7 (system operations, including incident detection and response), CC8 (change management), and CC9 (risk mitigation, vendor management, and BCP/DR). For a deeper look at how these criteria are structured and what auditors are actually testing, the Trust Services Criteria breakdown is worth reading before you start mapping your runbooks. If your scope includes the Availability criteria, A1.2 and A1.3 will require runbooks for failover, restoration, and capacity management. Confidentiality and Privacy add data handling and retention runbooks on top. If you are still determining which criteria apply to your organisation, a structured gap analysis is the most reliable starting point. Why Your Organization Needs a SOC 2 Runbook The common failure pattern is not the absence of policies. It is the absence of a credible bridge between the policy and what people actually do at 2am during an incident. How Runbooks Demonstrate Control Effectiveness to Auditors Auditors sample. For a Type II report covering twelve months, they will pull a population of incidents, changes, access reviews, or vendor onboardings, and trace a sample of them end to end. Without runbooks, that trace usually breaks. Engineers describe what they did from memory, ticket histories are inconsistent, and the auditor has no baseline to test against. With runbooks, the auditor compares the documented steps to what actually happened in the artefacts. If the runbook says approval is required, the ticket should show it. If it says evidence must be retained for ninety days, the log should be there. The runbook turns a subjective conversation into an objective trace. Runbooks as Evidence: Avoiding the Audit Evidence Trap A specific failure mode is what practitioners call the evidence trap: the control exists, the team is doing the right thing, but nothing was captured at the time. Three months later, the SIEM has rotated the logs, the on-call engineer has left, and the only record is a Slack thread no one can find. Runbooks prevent this when they make evidence capture a step in the procedure itself, not an afterthought. A line in the runbook that reads export the relevant CloudTrail entries to the incident folder before remediation is what stands between you and a qualified opinion. Pro Tip: Build evidence capture into the runbook as a numbered step, not a footer note. Auditors test what is written. If “save the screenshot” is step 7, it gets done. If it is buried in a paragraph at the bottom, it usually does not. SOC 2 Type I vs. Type II: How Runbooks Support Each A SOC 2 Type I report assesses the design of controls at a single point in time. For Type I, the runbook itself, together with the policies it references, is most of what auditors need. Type II is a different beast. It tests operating effectiveness over a period (typically six to twelve months), and that is where runbooks earn their keep. Each completed run produces evidence: a ticket, a log entry, a screenshot, a signed approval. Over twelve months those artefacts become the case for control effectiveness. Without runbooks, evidence collection is reactive and full of gaps. With them, it is a byproduct of normal work. For a fuller picture of what to expect across both report types, the SOC 2 compliance checklist is a useful companion to this guide.   Core Components

SOC 2 compliance is a critical trust signal for organizations handling sensitive data. Unlike ISO standards, SOC 2 reports are private attestations issued by licensed CPA firms, making verification essential.  To verify a SOC 2 report, you need to review the auditor’s opinion, audit period, report type, scope, and any control exceptions, then confirm the auditor’s AICPA registration and request a bridge letter if the report is outdated. In today’s cybersecurity-driven business environment, SOC 2 compliance has become one of the most recognized trust signals in the industry. Whether you are a SaaS provider handling customer data or an enterprise evaluating third-party vendors, a SOC 2 report plays a central role in proving that security controls are properly designed and operating effectively. Verifying a SOC 2 report, however, is not as simple as checking a public registry. Unlike ISO 27001, SOC 2 is not a public certification. Despite being regulated by the AICPA, there is no central database or government portal where you can confirm a company’s compliance status. Instead, SOC 2 is a private attestation report, issued by an independent CPA firm. That makes verification a matter of careful review and disciplined due diligence. If you want to understand how SOC 2 stacks up against other frameworks, our breakdown of ISO 27001 vs SOC 2 is a good place to start. This guide explains how to properly verify a SOC 2 report, what to watch for, and how expert partners like Axipro help organizations achieve and maintain SOC 2 compliance so their reports hold up to real scrutiny. Why Verifying a SOC 2 Report Matters SOC 2 reports are widely used across vendor risk management, enterprise procurement decisions, security questionnaires, and customer trust and sales cycles. Because SOC 2 reports are private and shareable only under NDA, verification responsibility falls entirely on the recipient. Accepting an outdated, poorly scoped, or improperly audited SOC 2 report can expose your organization to serious security and compliance risks. According to IBM’s Cost of a Data Breach Report, the average cost of a data breach continues to climb year over year, and third-party vendor relationships remain one of the most common attack vectors. Treating SOC 2 verification as a formality is not just sloppy governance; it is a liability. Knowing how to verify a SOC 2 report, and working with the right compliance experts, is not optional. It is essential. Step 1: Thoroughly Review the SOC 2 Report Key Sections Once a company provides its SOC 2 report (typically under a Non-Disclosure Agreement), your first step is a structured internal review. There are five areas you must examine closely. The Auditor’s Opinion is the single most critical section of the report. The opinion should be Unqualified (also called Unmodified). A Qualified, Adverse, or Disclaimer opinion is a major red flag and should immediately prompt further questions. An unqualified opinion means the auditor found no material issues with how controls were designed or operated during the audit period. The Report Period and Date tell you whether the report is still relevant. SOC 2 reports are generally considered valid for 12 months. Confirm the exact audit period, for example, October 1, 2024 to September 30, 2025, and flag anything older than that as potentially unreliable without additional assurance documentation. The Report Type is equally important. A SOC 2 Type I assesses whether controls were properly designed at a single point in time. A SOC 2 Type II evaluates whether those controls actually operated effectively over a defined period, typically six to twelve months. For most enterprise customers, SOC 2 Type II is the expected standard, and anything less should be treated with appropriate skepticism. The Scope of Services, found in the System Description section, must explicitly include the product or service you are evaluating. A SOC 2 report that does not cover the relevant system offers limited assurance, regardless of how clean the auditor’s opinion is. Exceptions and Control Failures in the testing results section deserve careful attention. Look for exceptions, failed controls, or deviations from expected behavior. Not all exceptions are disqualifying, but you need to assess whether they represent a material risk to your data or operations. If the report contains a significant number of exceptions or a pattern of failures in critical areas, that is a conversation worth having with the vendor before proceeding. If you want a structured checklist to guide this review process internally, we have put one together here. Step 2: Verify the Auditor’s Credibility A SOC 2 report is only as trustworthy as the CPA firm that issued it. This step is non-negotiable. The auditor must be a licensed CPA firm authorized to perform SOC engagements under the standards set by the American Institute of Certified Public Accountants (AICPA). The AICPA is the governing body for SOC reporting, and any firm issuing these reports must be formally registered with them. Beyond registration, AICPA requires CPA firms to undergo periodic peer reviews to ensure quality and professional standards are maintained. You can check a firm’s peer review standing directly through the AICPA peer review database or verify their status through the relevant state board of accountancy. This is a free, publicly accessible check that takes minutes, and skipping it is a mistake. An unlicensed or non-peer-reviewed firm issuing a SOC 2 report is not just a compliance risk, it is a sign the report may not be worth the paper it is written on. Axipro works closely with reputable, AICPA-registered audit firms, helping clients select the right auditor and ensuring the engagement meets all professional and regulatory expectations from the start. Step 3: Request a Bridge Letter When There Is a Coverage Gap SOC 2 reports cover a defined period. If the most recent report ended several months ago and the next audit is still in progress, you are operating in a coverage gap, a window of time where you have no formal attestation of current control effectiveness. In this situation, you should request a Bridge Letter, sometimes