D isaster recovery is an organization’s method of regaining access and functionality to its IT infrastructure after events like a natural disaster, cyber attack, or even business disruptions related to the COVID-19 pandemic. A variety of disaster recovery (DR) methods can be part of a disaster recovery plan. DR is one aspect of business continuity.
Every organization should have a solid business recovery plan to stay in the competitive market. Otherwise damage on the IT infrastructure can cause irreversible damage to the name of the business.
While developing a recovery plan following elements are the prominent areas that should be focused.
- Disaster recovery team: This assigned group of specialists will be responsible for creating, implementing and managing the disaster recovery plan.
- Risk evaluation: Assess potential hazards that put our organization at risk.
- Business-critical asset identification: Documentation of which systems, applications, data, and other resources are most critical for business continuity, as well as the necessary steps to recover data.
- Backups: Determine what needs backup (or to be relocated), who should perform backups, and how backups will be implemented. Include a recovery point objective (RPO) that states the frequency of backups and a recovery time objective (RTO) that defines the maximum amount of downtime allowable after a disaster. These metrics create limits to guide the choice of IT strategy, processes and procedures that make up an organization’s disaster recovery plan. The amount of downtime an organization can handle and how frequently the organization backs up its data will inform the disaster recovery strategy.
- Testing and optimization: The recovery team should continually test and update its strategy to address ever-evolving threats and business needs.
Now let’s focus on the disaster recovery specified to AWS. Aws infrastructure is matured in handling the possibilities of disasters. How every AWS is following a shared responsibility model where pat of the responsibility is delegated to the customers as well. Following diagrams describes that properly.
Types of Disaster Recovery
Below pointed out the different types of disaster recovery methods. The implementation cost increases as the number increases.
- Back-up : Simplest type of disaster recovery. But backing up data provides only minimal business continuity help, as the IT infrastructure itself is not backed up.
- Cold Site : In this type of disaster recovery, an organization sets up a basic infrastructure in a second, rarely used facility that provides a place for employees to work after a natural disaster or fire. It can help with business continuity because business operations can continue, but it does not provide a way to protect or recover important data, so a cold site must be combined with other methods of disaster recovery.
- Hot Site :A hot site maintains up-to-date copies of data at all times. Hot sites are time-consuming to set up and more expensive than cold sites, but they dramatically reduce down time.
- Disaster Recovery as a Service (DRaaS): In the event of a disaster or ransomware attack, a DRaaS provider moves an organization’s computer processing to its own cloud infrastructure, allowing a business to continue operations seamlessly from the vendor’s location, even if an organization’s servers are down. DRaaS plans are available through either subscription or pay-per-use
models. There are pros and cons to choosing a local DRaaS provider: latency will be lower after transferring to DRaaS servers that are closer to an organization’s location, but in the event of a widespread natural disaster, a DRaaS that is nearby may be affected by the same disaster.
- Back Up as a Service: Similar to backing up data at a remote location, with Back Up as a Service, a third party provider backs up an organization’s data, but not its IT infrastructure.
- Datacenter disaster recovery: The physical elements of a data center can protect data and contribute to faster disaster recovery in certain types of disasters. For instance, fire suppression tools will help data and computer equipment survive a fire. A backup power source will help businesses sail through power outages without grinding operations to a halt. Of course, none of these physical disaster recovery tools will help in the event of a cyber attack.
- Virtualization: Organizations can back up certain operations and data or even a working replica of an organization’s entire computing environment on off-site virtual machines that are unaffected by physical disasters. Using virtualization as part of a disaster recovery plan also allows businesses to automate some disaster recovery processes, bringing everything back online faster. For virtualization to be an effective disaster recovery tool, frequent transfer of data and workloads is essential, as is good communication within the IT team about how many virtual machines are operating within an organization.
- Point-in-time copies: Point-in-time copies, also known as point-in-time snapshots, make a copy of the entire database at a given time.
- Instant recovery: Instant recovery is similar to point-in-time copies, except that instead of copying a database, instant recovery takes a snapshot of an entire virtual machine.
Out of the above mentioned disaster recovery types it would be hard to identify how to select specific types for a specific component. To make that decision the above graph is used. Note that the “Cost of business impact curve” is relevant to the business while “Recovery cost” curve illustrates the implementation cost for different strategies. The management should decide how much of a budget they can allocate to the infrastructure which directly describes the cost of business impact they should be ready to face in case of a disaster.
Backing Up Aws Services
Different aws services provide different strategies to manage disasters. We have pointed out the prominent AWS services and their methologies
|AWS Service||Method To back up|
|DynamoDb||On-demand Backup and Restore Point-in-time-recovery|
|S3||●S3 cross region replication
|App Config||Adding Yaml to corresponding repository|
1. Backup & Restore
It consists of maintaining a backup of AWS Lambda functions so that you can restore them in another region in the event of a disaster. Keep the source code and settings of your functions under the management of a versioning tool and use tags to baseline your code versions. During the restore process, you can install your functions in two ways: Manually or through a CI/CD Pipeline.
2. Pilot Light
For this strategy, AWS Lambda functions that are critical to the minimal functioning of the DR region must be functional and up to date. In this way, consider minimally that critical AWS Lambda functions are managed via CI/CD Pipeline. At the time of provisioning the production facility in the event of a disaster, provision the environment also through the CI/CD Pipelines or, at least, Manual.
3. Warm Standby & Active-Active
In this case, you will have two functional productive environments, but with one of them running at minimum capacity. Maintaining both manually is unfeasible, and in this case the CI/CD Pipeline is the option to use
Possible ways of compromising AWS account
1. Git repository misconfigurations
AWS keys are shared with the developers and engineers in the company which means that keys are hosted for use in a shared location on an internal network, such as a private Git repository. This could allow rogue users on the internal network access to AWS where logging could not necessarily pinpoint the source of the compromise. If Outside person can gain access to the network by those exposed credentials.
How to avoid disclosing the AWS credentials and API keys?
● Don’t store credentials in version control – It doesn’t matter that the repo is private. Manage them in a secret store that can be accessed at runtime (sometimes at deploy time) such as AWS Secrets Manager, Systems Manager Parameter Store, or Hashicorp Vault.
● Use Git repository scanner to analyse sensitive information and security issues.Eg:- git-secret
2. Social Engineering of AWS users
Just like any other service that accepts usernames and passwords for logging in, AWS users are vulnerable to social engineering attacks from attackers. Fake emails, calls, or any other method of social engineering, could end up with an AWS users’ credentials in the hands of an attacker.
The AWS API keys can be disclosed by the phishing attacks.Most of the time users are tricked with malicious links which can lead to the installation of malware, the freezing of the system as part of a ransomware attack or the revealing of sensitive information.
● Use MFA will prevent accessing the user to the AWS account.
● Be careful with the spam emails and check domain name of the site when entering the credentials.
● Use different password for AWS account other than your common password.
3. Vulnerabilities in AWS hosted applications.
Server-side request forgery – Attackers can meke arbitrary web requests from a compromised server to a target of their choice. If an attacker finds this vulnerability in a web application and can make requests from an EC2 instance, they can target the internal EC2 meta-data API.
Local file read – WS keys can often be found in configuration files, log files, or other various places on an operating system. If a user on the system uses the AWS CLI, then their credentials are stored in their home directory and keys might also be stored in some sort of environment variable file as well. An attacker with the ability to read files on the operating system may be able to read these files and use those keys for further compromise.
How to avoid vulnerabilities in AWS hosted applications?
● Apply security patches to the hosted applications
● Implement a proactive internal vulnerability scanner
● Grant least privileges to the IAM roles.
4. Breach in trusted 3rd parties.
If a trusted third party is compromised, that could mean the attacker may be able to pivot into your own environment as well with the access/data they gained from the 3rd party.
Another example would be if you use a 3rd party password manager for handling access to various services in your environment. If they were to be compromised, an attacker could potentially gain a very high level of access to your AWS environment.
Although these scenarios seem unlikely, they do happen. One example is the breach of the “unified access management platform” (simple terms: password manager) OneLogin in 2015, where attackers gained access to customer databases that contained sensitive information (source). It is also interesting to note that this access was gained because the attackers were able to compromise a set of AWS keys belonging to the OneLogin AWS account, where the customer databases were being hosted.
What should we do when AWS account id compromised?
If you suspect unauthorized activity in your AWS account, first verify if the activity was unauthorized by doing the following:
● Identify any unauthorized actions taken by the AWS Identity and Access Management (IAM) identities in your account.
● Identify any unauthorized access or changes to your account.
● Identify the creation of any unauthorized resources or IAM users
Follow these steps if you found unauthorized activity in the AWS account.
- Rotate and delete AWS access keys and passwords.
- Rotate any potentially unauthorized IAM user credentials
- Delete any unrecognized or unauthorized resources
- Recover backup resources.
Preventive actions taken by the parties experienced with compromised accounts
- Enabled AWS Guard Duty and attached alerts to GuardDuty events.( But those AWS services are expensive to use)
- Create alerts for resources creation on our account.
- Established clear nomenclature for AWS services.
- Set resources scope for each key to only allow access to required access and no more than that, thus separated keys for say S3 access, ECR access, etc.
- Enabled 2-factor auth for all users with console login and set strict password renewal and expiry policies.
- Creation of a vulnerability and security audit schedule to identify active and unresolved security loopholes in our applications.
Authors: Anushka Idamekorala, Supimi Piumika, Sharmila Ranasinghe