RESILIENCE_DRILLS_UNDER_DORA

DORA has been in force for fifteen months. The Regulation's position is clear: resilience is not a diagram on a slide, it is evidence of tested recovery. Teams that understood this early have built drills, documentation, and third-party evidence into ordinary operations. Teams that treated DORA as a future planning exercise are now catching up under active regulatory scrutiny.

THE_REGULATORY_PICTURE

DORA has applied since 17 January 2025. The ESAs — EBA, EIOPA, and ESMA — have published final regulatory technical standards covering ICT risk management, incident classification, threat-led penetration testing, and third-party risk register requirements. Those standards are now in force and are what supervisory examinations will test against.

WHERE_ARCHITECTURE_ASSUMPTIONS_BREAK

Every platform has hidden coupling that the architecture diagram does not show. DNS resolvers, identity providers, shared payment rails, internal certificate authorities, backup storage endpoints. These become visible when something fails. The objective under DORA is to find them before regulators or customers do.

Explicit failure domains make graceful degradation achievable rather than theoretical. When you know which dependencies sit in the critical path of which customer journeys, you can make reasonable decisions about load shedding, read-only fallback modes, and what a meaningful partial outage looks like compared to a total one.

"Resilience is measured in minutes of confusion you avoid, not years of uptime you claim."

TESTED_RECOVERY_IS_THE_EVIDENCE

Under the DORA technical standards, tested resilience is a documented output, not a verbal assurance. Restore tests need to demonstrate actual data integrity, not just successful backup job completion. Failover drills need timestamps and decision logs. ICT third-party risk registers need to cover not just who the providers are, but concentration risk, substitutability, and what a realistic transition plan would require in practice.

  • Trigger traffic movement based on real user-facing signals, not a single-node health check that misses dependency failures upstream.
  • Protect the highest-priority journeys explicitly with defined load-shedding sequences — secondary work should degrade first, in a documented order.
  • Run backup restoration drills with verification steps that confirm data integrity, not just a successful restore command that completes without error.
  • Keep ICT third-party risk registers current — if the list of critical providers changes, the register needs to reflect it before the next supervisory review.

THE_OPERATOR_OWNERSHIP_PROBLEM

Most incident plans fail at the handoff between automation and human judgment. Automation handles clean failover. It cannot decide when to roll back a deployment that is degrading slowly, when to escalate to a supplier, or when to invoke a disaster recovery site. Those decisions need named owners with on-call authority and written thresholds — not intuition developed in real time during the incident itself.

  • Map every critical dependency to a named owner with authority to act, not just awareness of the failure.
  • Set quantitative rollback and escalation thresholds in writing, tied to SLO burn rates rather than gut feel.
  • Run tabletop exercises on the scenarios that are hardest to automate, not just the ones you already have runbooks for.

PRIMARY_REFERENCES