Business continuity management program in Azure (2022)

  • Article
  • 6 minutes to read

Azure maintains one of the most mature and respected business continuity management programs in the industry. The goal of business continuity in Azure is to build and advance recoverability and resiliency for all independently recoverable services, whether a service is customer-facing (part of an Azure offering) or an internal supporting platform service.

In understanding business continuity, it's important to note that many offerings are made up of multiple services. At Azure, each service is statically identified through tooling and is the unit of measure used for privacy, security, inventory, risk business continuity management, and other functions. To properly measure capabilities of a service, the three elements of people, process, and technology are included for each service, whatever the service type.

Business continuity management program in Azure (1)

For example:

  • If there's a business process based on people, such as a help desk or team, the service delivery is what they do. The people use processes and technology to perform the service.
  • If there's technology as a service, such as Azure Virtual Machines, the service delivery is the technology along with the people and processes that support its operation.

Many of the offerings Azure provides require customers to set up disaster recovery in multiple regions and aren't the responsibility of Microsoft. Not all Azure services automatically replicate data or automatically fall back from a failed region to cross-replicate to another enabled region. In these cases, recovery and replication must be configured by the customer.

(Video) Azure Essentials: Business continuity and disaster recovery

Microsoft does ensure that the baseline infrastructure and platform services are available. But in some scenarios, usage requires the customer to duplicate their deployments and storage in a multi-region capacity, if they opt to. These examples illustrate the shared responsibility model. It's a fundamental pillar in your business continuity and disaster recovery strategy.

Division of responsibility

In any on-premises datacenter, you own the whole stack. As you move assets to the cloud, some responsibilities transfer to Microsoft. The following diagram illustrates areas and division of responsibility between you and Microsoft according to the type of deployment.

Business continuity management program in Azure (2)

A good example of the shared responsibility model is the deployment of virtual machines. If a customer wants to set up cross-region replication for resiliency if there's region failure, they must deploy a duplicate set of virtual machines in an alternate enabled region. Azure doesn't automatically replicate these services over if there's a failure. It's the customer's responsibility to deploy necessary assets. The customer must have a process to manually change primary regions, or they must use a traffic manager to detect and automatically fail over.

Customer-enabled disaster recovery services all have public-facing documentation to guide you. For an example of public-facing documentation for customer-enabled disaster recovery, see Azure Data Lake Analytics.

For more information on the shared responsibility model, see Microsoft Trust Center.

(Video) Business Continuity Strategies in Azure

Business continuity compliance: Service-level responsibility

Each service is required to complete Business Continuity Disaster Recovery records in the Azure Business Continuity Manager Tool. Service owners can use the tool to work within a federated model to complete and incorporate requirements that include:

  • Service properties: Defines the service and how disaster recovery and resiliency are achieved and identifies the responsible party for disaster recovery (for technology). For details on recovery ownership, see the discussion on the shared responsibility model in the preceding section and diagram.

  • Business impact analysis: This analysis helps the service owner define the recovery time objective (RTO) and recovery point objective (RPO) based on the criticality of the service across a table of impacts. Operational, legal, regulatory, brand image, and financial impacts are used as target goals for recovery.

    Note

    Microsoft doesn't publish RTO or RPOs for services because this data is for internal measures only. All customer promises and measures are SLA-based because it covers a wider range versus RTO or RPO, which is only applicable in catastrophic loss.

    (Video) Business Continuity Strategies in Azure

  • Dependencies: Each service maps the dependencies (other services) it requires to operate no matter how critical, and is mapped to runtime, needed for recovery only, or both. If there are storage dependencies, another data is mapped that defines what's stored, and if it requires point-in-time snapshots, for example.

  • Workforce: As noted in the definition of a service, it's important to know the location and quantity of workforce able to support the service, ensuring no single points of failure, and if critical employees are dispersed to avoid failures by cohabitation in a single location.

  • External suppliers: Microsoft keeps a comprehensive list of external suppliers, and the suppliers deemed critical are measured for capabilities. If identified by a service as a dependency, supplier capabilities are compared to the needs of the service to ensure a third-party outage doesn't disrupt Azure services.

  • Recovery rating: This rating is unique to the Azure Business Continuity Management program. This rating measures several key elements to create a resiliency score:

    • Willingness to fail over: Although there can be a process, it might not be the first choice for short-term outages.
    • Automation of failover.
    • Automation of the decision to fail over.

    The most reliable and shortest time to failover is a service that's automated and requires no human decision. An automated service uses heartbeat monitoring or synthetic transactions to determine a service is down and to start immediate remediation.

  • Recovery plan and test: Azure requires every service to have a detailed recovery plan and to test that plan as if the service has failed because of catastrophic outage. The recovery plans are required to be written so that someone with similar skills and access can complete the tasks. A written plan avoids relying on subject matter experts being available.

    (Video) Business Continuity with Azure - Connectivity

    Testing is done in several ways, including self-test in a production or near-production environment, and as part of Azure full-region down drills in canary region sets. These enabled regions are identical to production regions but can be disabled without affecting customers. Testing is considered integrated because all services are affected simultaneously.

  • Customer enablement: When the customer is responsible for setting up disaster recovery, Azure is required to have public-facing documentation guidance. For all such services, links are provided to documentation and details about the process.

Verify your business continuity compliance

When a service has completed its business continuity management record, you must submit it for approval. It's assigned to a business continuity management experienced practitioner who reviews the entire record for completeness and quality. If the record meets all requirements, it's approved. If it doesn't, it's rejected with a request for reworking. This process ensures that both parties agree that business continuity compliance has been met and that the work is only attested to by the service owner. Azure internal audit and compliance teams also do periodic random sampling to ensure the best data is being submitted.

Testing of services

Microsoft and Azure do extensive testing for both disaster recovery and for availability zone readiness. Services are self-tested in a production or pre-production environment to demonstrate independent recoverability for services that aren't dependent on major platform failovers.

To ensure services can similarly recover in a true region-down scenario, "pull-the-plug"-type testing is done in canary environments that are fully deployed regions matching production. For example, the clusters, racks, and power units are literally turned off to simulate a total region failure.

During these tests, Azure uses the same production process for detection, notification, response, and recovery. No individuals are expecting a drill, and engineers relied on for recovery are the normal on-call rotation resources. This timing avoids depending on subject matter experts who might not be available during an actual event.

(Video) Azure Fundamentals Series - Security services, business continuity, governance, and subscriptions

Included in these tests are services where the customer is responsible for setting up disaster recovery following Microsoft public-facing documentation. Service teams create customer-like instances to show that customer-enabled disaster recovery works as expected and that the instructions provided are accurate.

For more information on certifications, see the Microsoft Trust Center and the section on compliance.

Next steps

FAQs

What are the 3 main areas of business continuity management? ›

Three key components of a business continuity plan

A business continuity plan has three key elements: Resilience, recovery and contingency.

What is the RTO for Azure? ›

"Recovery Time Objective (RTO)" means the period of time beginning when Customer initiates a Failover of a Protected Instance for Azure-to-Azure replication to the time when the Protected Instance is running as a virtual machine in secondary Azure region, excluding any time associated with manual action or the ...

What is business continuity plan in cloud computing? ›

Cloud Computing, Business Continuity, and Disaster Recovery

Business continuity helps the entire business persist in a crisis. Disaster recovery is the first step in business continuity and ensures that IT and communications work. Disaster recovery may rely on cloud service models like IaaS and SaaS.

Does office 365 have disaster recovery? ›

Microsoft's offerings for Microsoft 365 disaster recovery

Microsoft provides Microsoft 365 customers with various tools so they can perform a granular recovery on their own. For instance, OneDrive for Business includes a recycle bin where users can retrieve deleted items.

What are the 4 main components of the BCM Programme management? ›

2) What are the 4 main areas of business continuity management? The four main areas of business continuity management are 1) disaster prevention, 2) disaster preparedness, 3) disaster response and 4) disaster recovery.

What are the 5 components of a business continuity plan? ›

In order to achieve this, every business continuity plan needs to incorporate five key elements.
  • Risks and potential business impact. ...
  • Planning an effective response. ...
  • Roles and responsibilities. ...
  • Communication. ...
  • Testing and training.
6 Jul 2020

What is difference between RTO and RPO in Azure? ›

Differences Between RTO and RPO

The recovery time objective (RTO) is the target period of time for downtime in the event of IT downtime while recovery point objective is the maximum length of time from the last data restoration point.

What's the difference between RTO and RPO? ›

RPO designates the variable amount of data that will be lost or will have to be re-entered during network downtime. RTO designates the amount of “real time” that can pass before the disruption begins to seriously and unacceptably impede the flow of normal business operations.

How does Azure ASR work? ›

Azure Site Recovery (ASR) is a DRaaS offered by Azure for use in cloud and hybrid cloud architectures. A near-constant data replication process makes sure copies are in sync. The application consistent snapshot feature of Azure Site Recovery ensures that the data is in usable state after the failover.

What is an example of a business continuity plan? ›

A key component of a business continuity plan (BCP) is a disaster recovery plan that contains strategies for handling IT disruptions to networks, servers, personal computers and mobile devices. The plan should cover how to reestablish office productivity and enterprise software so that key business needs can be met.

What is the need of business continuity program? ›

The importance of a business continuity plan

Communication between employees and customers. Workflow operations essential to business activity. Customer service response, especially if you are a service provider. Business security, keeping your data and information secured wherever you and your team find yourself ...

How does cloud help business continuity? ›

Benefits of cloud business continuity

This enables mission-critical applications to run even if the organization experiences data center issues. The cloud also simplifies disaster recovery (DR) planning. On-premises continuous data protection offerings can often be configured to write a backup copy to the cloud.

What is RTO o365? ›

RTO or Recovery Time Objective refers to how much time an application can be down without causing significant damage to the business or the duration of time between loss and recovery of the critical data that resides on those Applications.

What are the two types of process classification in BCM? ›

They are micro (individual), meso (group or organization) and macro (national or interorganizational). There are also two main types of resilience, which are proactive and post resilience.

What are the stages of business continuity? ›

The 4 phases of a business continuity plan
  • Initial response.
  • Relocation.
  • Recovery.
  • Restoration.
2 Nov 2018

What is the first step of business continuity planning? ›

Step 1: Risk Assessment

Assessment of the potential impact of various business disruption scenarios. Determination of the most likely threat scenarios. Assessment of telecommunication recovery options and communication plans. Prioritization of findings and development of a roadmap.

What is RPO and RTO with examples? ›

They are strictly numeric time values. For example, an RTO for a fairly critical server might be one hour, whereas the RPO for less-than-critical data transaction files might be 24 hours, and might also support the use of backup tape storage equipment.

Which is more important RPO or RTO? ›

The shorter the RTO, the greater the resources required. RPO is used for determining the frequency of data backup to recover the needed data in case of a disaster.

What's the difference between BCP and DRP? ›

BCP: Business Continuity Planning deals with keeping business operations running — perhaps in another location or by using different tools and processes — after a disaster has struck. DRP: Disaster Recovery Planning deals with restoring normal business operations after the disaster takes place.

What is RPO and RTO in cloud? ›

As a quick refresher, RTO stands for Recovery Time Objective and is a measure of how quickly after an outage an application must be available again. RPO, or Recovery Point Objective, refers to how much data loss your application can tolerate.

How RPO is calculated? ›

If you want to calculate recovery point objectives for your business or organization, consider following these five steps:
  • Look at how often files update. ...
  • Review the goals of your BCP. ...
  • Consider industry standards. ...
  • Establish and approve each RPO. ...
  • Analyze your RPO settings consistently.

What is RTO RPO and MTD? ›

Here's what the acronyms RPO, RTO, WRT and MTD mean: Recovery Point Objective (RPO) Recovery Time Objective (RTO) Work Recovery Time (WRT) Maximum Tolerable Downtime (MTD)

How do I enable ASR on my Azure VM? ›

Configuring Azure
  1. Step 1: Create a Recovery Services Vault. ...
  2. Step 2: Choose your Protection Goal(s) ...
  3. Step 3: Setup the Source Environment. ...
  4. Step 4: Install and Configure the ASR Provider on Hyper-V Host. ...
  5. Step 5: Create a Replication Policy. ...
  6. Step 6: Associate Hyper-V Site(s) ...
  7. Step 7: Create a Storage Account + Virtual Network.
14 Nov 2016

What is disaster recovery in Azure? ›

What is Azure Disaster Recovery? A business continuity and disaster recovery (BCDR) strategy helps organizations secure data, applications, and workloads during planned or unplanned outages. To help organizations implement BCDR, Azure provides Azure Site Recovery (ASR).

What is recovery plan in ASR? ›

A recovery plan defines how machines fail over, and the sequence in which they start after failover. Recovery plans can be used for both failover to and failback from Azure. Up to 100 protected instances can be added to one recovery plan.

What are the four P's of business continuity planning? ›

When devising a business continuity strategy, you should consider the 4 P's: people (staff and customers), processes (the technology and processes required), premises and providers, suppliers and partners.

What are the three key outputs of the BIA process? ›

The BIA quantifies the impacts of disruptions on service delivery, risks to service delivery, and recovery time objectives (RTOs) and recovery point objectives (RPOs).

What are the 7 steps of continuity management? ›

7 Steps to Create a Business Continuity Plan + Webinar Replay
  • Step 1: Regulatory Review and Landscape. ...
  • Step 2: Risk Assessment. ...
  • Step 3: Perform a Business Impact Analysis. ...
  • Step 4: Strategy and Plan Development. ...
  • Step 5: Create an Incident Response Plan. ...
  • Step 6: Plan Testing, Training and Maintenance. ...
  • Step 7: Communication.
26 Jun 2018

What is the first step in business continuity planning? ›

Steps to Creating a Business Continuity Plan
  1. Step 1: Assemble a Business Continuity Management Team. ...
  2. Step 2: Ensure the Safety and Wellbeing of Your Employees. ...
  3. Step 3: Understand the Risks to Your Company. ...
  4. Step 4: Implement Recovery Strategies. ...
  5. Step 5: Test, Test Again and Make Improvements.
28 Sept 2020

Who is responsible for business continuity plan? ›

Business unit leaders (i.e. payroll, corporate travel, physical security, information security, HR) are responsible for creating their respective unit's business continuity plan under the guidance of the program manager.

What are the drivers of BCM? ›

What are the main drivers for Business Continuity Management?
  • Survival. The prime driver of business continuity management is survival. ...
  • Financial & Reputational. ...
  • Customers. ...
  • Employees. ...
  • Conclusion.
12 Feb 2019

What is the main purpose of a BIA? ›

A business impact analysis (BIA) predicts the consequences of disruption of a business function and process and gathers information needed to develop recovery strategies. Potential loss scenarios should be identified during a risk assessment.

What type of approach does a BIA use? ›

What type of approach does a BIA use? Top-down approach where CBFs are examined first.

What is an example of a business continuity plan? ›

A key component of a business continuity plan (BCP) is a disaster recovery plan that contains strategies for handling IT disruptions to networks, servers, personal computers and mobile devices. The plan should cover how to reestablish office productivity and enterprise software so that key business needs can be met.

Why is BCM important? ›

Effective BCM ensures that organisations can provide an acceptable service in the event of a disaster, helping them preserve their reputation and keep revenue coming in.

What is risk assessment in BCP? ›

What is a Risk Assessment? Completing a Risk Assessment is the first step in developing a Business Continuity Plan (BCP) for your critical functions and services. The Risk Assessment identifies the probability of risks to an organization and evaluates the impacts if these risks develop into an emergency.

What is BCP call tree? ›

A call tree is a layered hierarchical communication model that is used to notify specific individuals of an event and coordinate recovery, if necessary. A call tree is also known as a phone tree, call list, phone chain or text chain.

Videos

1. Business Continuity with Azure Backup and Site Recovery │ Expert Talk │Skill Me UP Academy
(Skill Me UP)
2. Business Continuity and Disaster Recovery using Azure
(Herns Hermida)
3. Azure Business Continuity & Disaster Recovery Services (BCDR)
(Charbel Nemnom)
4. Ryan Mangan - Disaster Recovery and Business Continuity BCDR options for Windows Virtual Desktop
(Christiaan Brinkhoff)
5. Azure Business Continuity Services for Administrators
(INEtraining)
6. Azure Backups & Disaster Recovery
(rhipe)

Top Articles

Latest Posts

Article information

Author: Jeremiah Abshire

Last Updated: 01/17/2023

Views: 5907

Rating: 4.3 / 5 (54 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Jeremiah Abshire

Birthday: 1993-09-14

Address: Apt. 425 92748 Jannie Centers, Port Nikitaville, VT 82110

Phone: +8096210939894

Job: Lead Healthcare Manager

Hobby: Watching movies, Watching movies, Knapping, LARPing, Coffee roasting, Lacemaking, Gaming

Introduction: My name is Jeremiah Abshire, I am a outstanding, kind, clever, hilarious, curious, hilarious, outstanding person who loves writing and wants to share my knowledge and understanding with you.