Figure 2-1. Sample application diagram with data assets
Tagging Cloud Resources
Most cloud providers, as well as container management systems such as Kubernetes,
have the concept of tags. A tag is usually a combination of a name (or "key") and a value. These tags can be used for lots of purposes, from categorizing resources in an inventory, to making access decisions, to choosing what to alert on. For example, you might have a key of PII-data and a value of yes for anything that contains personally identifiable information, or you might use a key of datatype and a value of PII.
The problem is clear: if everyone in your organization uses different tags, they won't
be very useful! Create a list of tags with explanations for when they must be used, use these same tags across multiple cloud providers, and require them to be applied by automation (i.e., automated tools) when resources are created. Even if one of your cloud providers doesn't explicitly support the use of tags, there are often other description fields that may be used to hold tags in easy-to-parse formats such as JSON.
Tags are free to use, so there's really no concern with creating a lot of them, although
cloud providers do impose limits on how many tags a resource can have (usually between 15 and 64 tags per resource). If you don't need to use them for categorizing or making decisions later, they're easily ignored.
Some cloud providers even offer automation to check whether tags are properly
applied to resources, so that you can catch untagged or mistagged resources early and correct them. For example, if you have a rule that every asset must be tagged with the maximum data classification allowed on that asset, then you can run automated scans
18 | Chapter 2: Data Asset Management and Protection
to find any resources where the tag is missing or where the value isn't one of the clas‐
sification levels you have decided upon.
Although all of the major providers support tags in some fashion, as of this writing
they don't all offer full coverage of these services. For example, you may be able to tag virtual machines you create, but not databases. Where tags are not available, you'll need to do things the old-fashioned way, with a manual list of instances of those serv‐ ices.
Table 2-1 shows the different names given to tagging by different cloud providers.
Table 2-1. Tagging features
Infrastructure Feature name
Amazon Web Services Tags
Microsoft Azure Tags
Google Compute Platform Labels and network tags
IBM Cloud Tags
Kubernetes Labels
We will talk more about tagging resources in Chapter 3, but for now, jot down some
data-related tags that may apply to your different cloud resources, such as data‐ class:low, dataclass:moderate, dataclass:high, or regulatory:gdpr.
Protecting Data in the Cloud
Several of the data protection techniques discussed in this section may also be applied
on-premises, but many cloud providers give you easy, standardized, and less expen‐ sive ways to protect your data.
Tokenization
Why store the data when you can store something that functions similarly to the data
but is useless to an attacker? Tokenization, which is most often used with credit card numbers, replaces a piece of sensitive data with a token (usually randomly generated). It has the benefit that the token generally has the same characteristics (such as being 16 digits long) as the original data, so underlying systems that are built to take that data don't need to be modified. Only one place (a "token service") knows the actual sensitive data. Tokenization can be used on its own or in conjunction with encryp‐ tion, discussed next.
Examples include cloud services that work with your browser to tokenize sensitive
data before sending it, and cloud services that sit in between the browser and the application to tokenize sensitive data before it reaches the application.
Protecting Data in the Cloud | 19
Encryption
Encryption is the silver bullet of the data protection world; we want to "encrypt all
the things," Unfortunately, it's a little more complicated than that. Data can be in three
states:
• In motion (being transmitted across a network)
• In use (currently being processed in a computer's CPU or held in RAM)
• At rest (on persistent storage, such as a disk)
Encryption of data in motion is an essential control and is discussed in detail in
Chapter 6. In this section, we'll discuss the other two states.
More bits are not always necessary (or even useful). For example,
AES-128 meets US federal government standards as of this writing and is often faster than AES-256, although quantum computers may eventually pose a threat to AES-128. Also, a hash algorithm like SHA-512 offers no additional protection if the hash is trunca‐ ted later to a shorter length.
Encryption of data in use
As of this writing, encryption of data "in use" is still relatively new and is targeted
primarily at very high security environments. It requires support in the hardware platform, and it must be exposed by the cloud provider. The most common imple‐ mentation is to encrypt process memory so that even a privileged user (or malware running as a privileged user) cannot read it, and the processor can read it only when that specific process is running.4 If you are in a very high security environment and your threat model includes protecting data in memory from a privileged user, you should seek out a platform that supports memory encryption; it goes by brand names such as Intel SGX, AMD SME, and IBM Z Pervasive Encryption.
Encryption of data at rest
Encryption of data at rest can be the most complicated to implement correctly. The
problem is not in encrypting the data; there are many libraries to do this. The prob‐ lem is that once you've encrypted the data, you now have an encryption key that can be used to access it. Where do many people put this? Right next to the data! Imagine locking a door and then hanging the key on a hook next to it helpfully labeled "key." To have real security (instead of just ticking a checkbox indicating that you've encryp‐
4 Note that in-memory encryption protects data only from attacks from outside the process; if you manage to
trick the process itself into doing something it shouldn't, it can read the memory and divulge the data.
20 | Chapter 2: Data Asset Management and Protection
ted data), you must have proper key management. Fortunately, there are cloud serv‐
ices to help.
Encrypted data can't be effectively compressed. If you want to make
use of compression, compress the data before encrypting it.
In traditional on-premises environments with high security requirements, you would
purchase a hardware security module (HSM) to hold your encryption keys, usually in the form of an expansion card or a module accessed over the network. An HSM has significant logical and physical protections against unauthorized access. With most systems, anyone with physical access can easily get access, but an HSM has sensors to wipe out the data as soon as someone tries to take it apart, scan it with X-rays, fiddle with its power source, or look threateningly in its general direction.
HSMs are expensive, and so are not feasible for most on-premises deployments.
However, in cloud environments, advanced technologies such as HSMs and encryp‐ tion key management systems are now within reach of projects with modest budgets.
Some cloud providers have an option to rent a dedicated HSM for your environment.
While this may be required for the highest-security environments, a dedicated HSM is still expensive in a cloud environment. Another option is a key management ser‐ vice (KMS), a multitenant service that uses an HSM on the backend to keep keys safe. You do have to trust both the HSM and the KMS (instead of just the HSM), which adds a little additional risk. However, compared to performing your own key man‐ agement (often incorrectly), a KMS provides excellent security at zero or very low cost. You can have the benefits of proper key management in projects with more modest security budgets.
Table 2-2 lists the key management options offered by the major cloud providers, as
of this writing.
Do'stlaringiz bilan baham: |