Data Security in the AWS Cloud

Key Keeper

© Lead Image © Kirill Makarov, 123rf, 123RF.com

© Lead Image © Kirill Makarov, 123rf, 123RF.com

Author(s):

As a cloud market leader, Amazon Web Services has had to  put a great deal of thought into data security. Encryption options and key management play an important role.

You've probably seen T-shirts emblazoned with "There is no cloud; it's just someone else's computer." This skepticism results from the management policy of quickly outsourcing as many IT services as possible, with the sole focus on efficiency and cost savings. As a result, data security becomes a secondary feature that the shrinking IT department must somehow guarantee.

Admins who simply run their applications in the cloud run the financially significant risk of violating the General Data Protection Regulation (GDPR), for example, if they store unprotected personal data on servers outside the European Union. However, the online bank N26, which runs entirely in Amazon Web Services (AWS), has passed an audit by the German regulator BaFin (in this respect), showing that it is feasible to operate cloud services compliant with strict rules.

In addition to the choice of the run-time environment (configured as the "region" on AWS and other cloud providers), there are several options for encrypting data for cloud storage. At the last AWS Summit in Berlin, the CTO of AWS, Werner Vogels wore a T-shirt that advocated "Encrypt Everything." If encryption is the answer, then who has access to the keys and where are they kept?

Who Can Do What?

The first question for data security in the cloud concerns read and write permissions. This issue raises its head whenever you deploy any type of IT service and starts with user management. Weaving a complex structure of authorizations that define which user can access which data, servers, and other resources can be a Sisyphean task, with changes occurring constantly in IT operations.

The sheer number of possible permissions from which admins can assemble roles and services are far greater in a cloud like AWS. Finding the permissions you need for a particular cloud service to work without allowing too much is never going to be trivial. The complexity of the task can drive admins to distraction, prompting them to press Allow everything and thus release confidential customer data in an openly accessible Amazon Simple Storage Service (S3) bucket (Amazon's object store). Although this is inexcusable, it is something that you can at least empathize with from personal experience.

Data protection to and from the cloud, and on internal transfer paths between services, is another consideration. Many admins will suggest enabling TLS. But in practice, the success of the project often depends on where the certificates originate.

While a multitude of AWS services are affected by access controls, I have limited this article to two basic AWS services: the S3 object store and the Elastic Compute Cloud (EC2) virtual machine (VM) service. Additionally, I will look at AWS key management, as well as a few aspects of Identity and Access Management (IAM), which distributes users and their rights.

Trinity

The confidentiality, integrity, and availability (CIA) triad plays an important role in determining data security. Confidentiality (C) means that only authorized users see the data content. On a public web page, the group of permissions will often be All.

Integrity (I) means that only authorized users can modify the data. Where applicable, this means that some of the authorized users are only able to change a certain dataset within defined value ranges. A bank employee, for example, can only transfer money to accounts per customer request, instead of at will.

Availability (A) pertains to how data is maintained and stored. If all the important corporate data is on a single hard disk without a backup, and the disk bites the dust, then the data is no longer available.

Protection from Whom?

When it comes to protection against unauthorized read (C) and write (I) access to the data in the cloud, admins need to determine who has access to which data. There is public access via the Internet, plus a small group of users with different authorization levels (i.e., order processing does not need access to human resources' salary tables).

Since the whole thing runs on a third-party infrastructure, you also need to consider protection from the cloud provider's employees, as well as access controls for the in-house administrators who manage the systems. This is particularly relevant for personal data, such as salary tables.

Availability is something that AWS customers can typically assume to be a given. With S3, for example, the user would have to actively disable high availability to voluntarily suffer from data loss in the event of a crash. In addition, the object store supports versioning so that the customer can revert to older versions in the event of problems.

Users, Roles, and Rights

In the AWS cloud, the simplest hierarchy level is that of accounts containing users who are assigned authorizations within the account, such as the ability to create and start VMs or databases. The IAM configuration area is used to manage users or admins.

AWS recommends setting up accounts with sub-accounts. This allows the AWS customer to impose company-wide policies so that even an admin with full rights for a sub-account cannot violate the organizational rules.

When generating an account, AWS also creates the superuser's credentials for this account. By clicking on the user list, the admin will find this superuser. The user has full access to all functions offered by AWS (the exception would be a sub-account with an organizational policy). If the admin creates a second user here, the user is only granted explicitly assigned rights. When creating a new user, you are first prompted for the user name and details of how this user will log on, via CLI/API and/or the Web Console.

Next, the admin assigns rights by selecting from existing user groups (for example S3 Admins, Networkadmins, etc.), assigning roles (S3 Admin, Networkadmin), or as individual assignments at policy level. If you really want to make your life complicated, you can also define an arbitrary combination of these rights for the new user.

To avoid selecting overly liberal permissions by mistake, a permissions boundary can be defined within which, say, the security administrator responsible for AWS restricts permissions in a policy. If a conflict then arises between this limitation and the assigned rights, the limitation wins.

Rights to Resources

AWS controls access through policies. A policy consists of a set of statements, each granting one or more rights to a resource (with wild cards) to a role or user. Optional conditions are possible. Listing 1 shows a section of a policy in JSON format [1].

Listing 1

JSON Policy Definition

01 {
02   "Version": "2012-10-17",
03   {
04       "Sid": "s3zugriff",
05       "Effect": "Allow",
06       "Action": [
07         "s3:List*",
08         "s3:Get*"
09       ],
10       "Resource": [
11         "arn:aws:s3:::confidential-data",
12         "arn:aws:s3:::confidential-data/*"
13       ],
14       "Condition": {"Bool": {"aws:MultiFactorAuthPresent": "true"}}
15     }
16   ]
17 }

The Sid field contains a name for the permission, but it is optional. Effect allows or denies access. The Action field contains a list of the API access instances at issue. In the example, these are all listing and downloading operations in the S3 API. The Resource field contains a list of targets for the operations, formulated as Amazon Resource Names (ARNs). The example shows a bucket named confidential-data and its contents. If you do not include the last field Condition, then the rule would be universal.

The condition in line 14 ensures multi-factor authentication of the user for this rule to apply. Depending on the logic to be mapped, the admin either writes individual policies in this form and combines them or bundles several statements into a single policy.

The Right to Interact

Policies are not only used for user access controls, but also to govern the interaction between AWS entities. A lambda function wanting to send a Simple Notification Service (SNS) [2], for example, needs a role that contains a policy with the corresponding rights in the SNS area.

Policies determine which operations are allowed on which objects. The admin assigns them to users or functions. In regard to the CIA triad, a policy controls who can access the data within the created AWS objects. This does not consider the confidentiality and integrity objectives in relation to the AWS administrators.

S3

S3 [3], one of the oldest services in AWS, is divided into buckets at the top level. A bucket contains folders and objects/files. S3 buckets are also the easiest way to launch a static website in AWS. You drop the files that make up the website into a bucket and then make them available via HTTP(S) with a few clicks. In the past, this occasionally went wrong, because confidential data was accidentally left in the clear on the Internet.

Figure 1 shows the settings for creating a bucket. If encryption is enabled, the user can choose whether the AWS system will use automatically generated data or the data stored in the Key Management Service (KMS) [4]. Encryption applies to the objects in the bucket. By default, public access is also blocked (Figure 2).

Figure 1: Creating an S3 bucket.
Figure 2: Blocking public access in bucket settings.

The admin can also control what kind of encryption applies at the folder level (Figure 3). The last stage is the individual object (a file, for example). The user can set encryption here (if using) along with the encryption method.

Figure 3: The folder encryption options in S3.

Alternatively, you can encrypt the objects locally in S3 before uploading them, so that there is no AWS key capable of decrypting them.

KMS

AWS by default uses keys that are automatically generated on the fly to quickly encrypt data. However, the confidentiality and integrity objectives pertaining to the Amazon admins are still not in place.

If you want more control over the keys, you need to use KMS [4]. In the KMS console, AWS offers a list of implicitly generated keys for encrypting databases. For customer-managed keys that are generated by the admin in the console and then have a policy applied to them, a policy is attached to a key – anyone who uses the key is allowed to do what the policy states.

When you create your own key in KMS using KMS to generate the key, the dialog prompts you to define who can manage the key and who can use it for encryption and decryption. Based on this, the user guide calculates a policy, which it attaches to the key.

Instead of having AWS generate a key, an admin can store an externally generated key. In this case, the console requires the admin to select an algorithm to pack the key before uploading it and a token to unpack the key again (see [5] for an example).

Finally, Amazon offers a CloudHSM cluster [6]. If you choose this option, you are binding several hardware security modules (HSMs) that contain a tamper-proof key; this achieves the highest level of control possible in the cloud.

All requests for encryption are answered by the HSMs; their hardware makes sure that nobody reads the keys. Designed as a cluster, CloudHSM is highly available (the A in CIA) – which is important for people who secure large sections of their infrastructure with HSM.

Due to interchangeable components and API compatibility, taking the big step towards CloudHSM in an architecture does not have to happen at the very beginning of development. In this configuration, the remaining trust topic is that the KMS clients are under the control of AWS (i.e., they store decrypted data in RAM).

VMs with Encrypted Hard Disks

Amazon's EC2 VM service uses S3 for its virtual hard disks, making encryption of the hard disks possible. Linux administrators have been aware of this kind of protection for a long time at the level of encrypted partitions or volumes. This either means that the admin has to enter a password when booting the machine or that the password has to be stored in the bootloader. In the latter case, the data remains unprotected in the event of theft.

The AWS Cloud uses KMS behind the scenes. When an admin creates a new VM, the memory management menu offers the option of encrypting the hard disk (Figure 4). The user keys generated in the selected AWS region appear in the selection list.

Figure 4: Selecting an encryption option in AWS EC2 when creating a VM.

When the VM starts, the hypervisor retrieves the data for decryption. Admins who failed to secure access permissions to the key when it was created are allowed to attach the hard disk to a VM, but are unable to read the data, similar to any other encrypted volume.

Since AWS unfortunately does not provide a console for a Linux VM, the ability to work with Linux on-board tools does not exist for the root volume. If the confidential data resides on a separate partition from the VM, the admin boots the VM in the normal way, manually mounts the volume, and enters the password; this means that the key does not reside in AWS.

Conclusions

Even AWS cannot protect private keys against every form of threat on third-party servers. Utilizing the CloudHSM services moves admins towards an acceptable level of protection for their corporate data. Regardless, users have to have a certain amount of trust in Amazon or – where possible – adapt the cloud architecture to avoid storing sensitive data.

KMS makes using encryption relatively simple, which hopefully mitigates some admins' tendency to avoid encryption altogether.

The Author

Konstantin Agouros works as Head of Open Source and AWS Projects at Matrix Technology AG, where he and his team advise customers on open source and cloud topics. His new book Software Defined Networking: Practice with Controllers and OpenFlow has been published by de Gruyter.