When do I invoke Cyber Recovery?

Introduction

Here’s a question I get asked quite frequently: “If we have a vault, how will the vault team know when to invoke a cyber recovery?”

There are two answers to this question — the simple one that’s technically accurate, and the more complex answer that’s operationally accurate.

Let’s start with the simple, technically accurate answer: If your vault throws an alert that compromised data has been detected, there’ll be some form of recovery done.

We frequently want exact answers that allow us to create very defined, actionable paths. I’d put it to you that this is a question that requires a more considered approach than a binary “Should we recover from the vault? [Y/N]” style answer.

To understand why this question requires a more nuanced, complex response, we have to first consider the differences between what causes us to invoke the three primary types of data recovery. Those types are:

  • Regular recovery
  • Disaster recovery
  • Cyber recovery

Now, in this situation, I’m only talking about a primary-use recovery requirement. I.e., secondary use situations (testing, auditing, data re-use) are not included in any of these scenarios.

What Causes the Need to Recover Data?

Regular Recovery

Let’s look at regular recovery operations, and their typical causes:

Regular Recovery Drivers

The recovery drivers here have changed over time. When I was a system/backup administrator in the 90s, I’d argue that I had to recover data more regularly due to hardware failure. But now between advances in hardware reliability and virtualisation, hardware is not going to be the primary driver for recoveries. Instead, it’s a data failure situation. That can mean several things – just to name three, you could have: user error (e.g., deleting or misplacing a file), application malfunction (e.g., Word crashes and destroys your document), or operational mistake (e.g., DBA deletes the production rather than dev/test database).

You might invoke regular recoveries in response to a platform failure, too – e.g., an entire ESX server falling over, or the core hardware in a partitioned Unix server failing.

The things that don’t usually drive regular recoveries include:

  • Staff acting maliciously
  • A network failure
  • A security breach
  • A site failure.

Yes, recoveries can happen as a result of the above scenarios, but they’re not regular recoveries at that point.

Disaster Recovery

So let’s look at the drivers for disaster recovery operations. These tend to be bigger drivers than regular recoveries:

Disaster Recovery Drivers

Your primary drivers are either a site failure of a platform failure – so scenarios like a flood in one of your datacentres, or a fire destroying a storage array. Going down the scale, you have things like hardware failure, network failure (and particularly incoming/outgoing links to the datacentre), or some broad data failure. Usually in these situations security is still OK, and your staff aren’t compromised1.

Cyber Recovery

And then there’s a cyber recovery situation. What are the likely drivers of this?

Cyber Recovery Drivers

There are a lot of potential drivers for a cyber recovery. Likely you do have some form of data failure – e.g., mass deletions, encryptions, corruption, etc. But there are several unknowns that could be driving the data failure:

  • Has the platform itself been compromised? (E.g., malware with admin access to your vCenter farm, your Active Directory, your cloud accounts? Or is this a supply-chain attack?)
  • Have your staff been compromised? (E.g., is this an insider attack, or at the very least an insider-assisted attack?)
  • Is your network even accessible or has it been compromised?
  • Has your security been compromised?

Lesser drivers are hardware/firmware, and your site is likely still physically OK. While there are some scenarios where you can imagine your hardware has been compromised (e.g., via a firmware level attack), there’s a good chance that your business equipment is physically fine (though a much higher chance that you may not trust it to be operationally fine).

Operational Implications

There are significant operational implications between each type of recovery. The most significant (at least for the purposes of this article) is one of trust. What elements of your environment can be trusted? Who amongst your team(s) can be trusted?

For a regular recovery or a disaster recovery, you can determine that trust reasonably quickly. The key differentiator with a cyber recovery is you can’t. You don’t know at the start whether you can trust your data, your platforms, your staff, your network or your security. Cyber attacks – true cyber attacks, not just a simple virus on your network – are likely to have multiple compromise points and affect a number of key systems or platforms.

So you should never ask how the vault team will know when to do a recovery, because it’s not their job to make that decision. At the point where the business realises it’s fighting a cyber attack, there needs to be a team responsible for coordinating and organising the response. This will need an executive sponsor (e.g., the CIO or CISO), but it’ll likely also include members from security, risk, IT, legal, and possibly even facilities (e.g., if you think of a physical security breach). The vault team? They’re cogs in the engine. Important cogs, but they’re not the engine.

In Conclusion

It turns out this question does have a simple answer, after all: your vault team will invoke recovery from the vault when your cyber incident response team tells them to. But like ’42’ in Life, The Universe and Everything, the real question is different to the one you’re asking. The real question covers the makeup of that team, your cyber incident processes, and your business continuity processes.

The role of your vault technical team after all is operational, not executive.

Footnotes

  1. Yes I know business continuity plans wrapping around disaster recovery will have to consider availability of staff in these situations, but they’re not the prima facie reason for the recovery per se.

1 thought on “When do I invoke Cyber Recovery?”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.