aerial view of cars parked on parking lot

Cloud security fundamentals Part 2: Isolation

The blog discusses isolation, which is a central control to ensure secure usage of cloud services. The blog is a continued dialogue in a multi part series that focuses on cloud security fundamentals.

10/25/20246 min read

The public cloud has unique differences compared to on-premise environments, which require rethinking at an architecture level. Organizations can focus on certain key fundamentals to securely navigate in the cloud. A simplified approach is to imagine cloud security in layers. At a high level, the fundamental controls are at the core, followed by use case specific considerations and then, individual cloud resource hardening. The multi part blog will focus on key fundamentals that can help you to get started on strong and consistent guardrails in the public cloud. Part 2 discusses isolation as one of the fundamental controls and how organizations can achieve it in the public cloud. The blog examines isolation across 3 components or building blocks of compute, network and storage.

Compute

The word compute can be simply defined as a service that allows a user to execute code and process data. The compute service variations in the cloud can be broadly classified into virtual machines (VMs), containers or functions (serverless). The difference between the variations is user control over compute configuration, ranging from managing a complete operating system (OS) to just code in a given runtime environment. There are sub variations but for the purposes of the dialogue, those can be ignored. There is also the option to get dedicated hardware, which is not much different from a standard data center setup but with the added convenience of cloud provider supported hardware management.

The virtual machines share the hardware and are logically separated from one another. The logical separation is managed by the hypervisor, which is the intermediary OS between the VM guest OS and the bare metal server. If there is an adversary in one of the VMs that can successfully target either the bare metal resources (CPU, memory, storage) or hypervisor itself then they can potentially compromise neighboring VMs. Some of the attack examples that can impact neighboring VMs are Rowhammer, Meltdown, Spectre, hypervisor vulnerabilities or VM escapes.

Though vulnerabilities and exploits may surface time and again, the cloud providers have a strong interest and incentive to discover and promptly patch vulnerabilities. The driver is the magnitude of impact to several cloud users. As users opt away from VMs to containers and functions, the responsibilities are also shifted to the cloud provider to manage the OS, orchestration and runtime. The change in the responsibility model has also influenced the development and maturity of hypervisors over time. Hypervisors have evolved to cater to the isolation considerations for compute variations i.e. VM, container and function (see references below that go into details).

In summary, hypervisors provide unique ability to share the same hardware across several cloud users while ensuring security and privacy. At the same time, organizations also have the opportunity to reduce their operational and management overheads by considering containers and serverless. As a simple exercise, you should be able to notice the difference in the average number of patches required for OS and installed components on a VM v/s a container image v/s serverless for the same application. Another example is the number of hardening items involved to secure the application and the supporting stack.

Network

A shared network in the cloud is perhaps difficult to imagine given the complexity and nature of interactions involved. Broadly, on one side, there are network interactions that users can visualize such as network flows in a VPC or network connectivity to the VPC. On the other side, there is network connectivity that moves user data that may not be apparent. Use cases are availability zone redundancies, regional failovers, global data replication (e.g. IAM). I am ignoring the cloud provider network management for the cloud environment itself as it does not fall on user's responsibility.

Where users do have the responsibility, isolation follows the same principle as on-premise network but more from a configuration standpoint and less as a classic infrastructure deployment. The users have the option of using cloud native network controls in the form of DNS, routing and firewall to implement isolation and create network boundaries.

An adversary can still target the network devices and protocols but that is something the cloud provider needs to address as part of their responsibility, similar to any other service management. Do note that the responsibility model changes if you deploy a third-party network solution, let’s say a vendor firewall in the cloud. And the vendor firewall will end up using some cloud compute, network and storage service.

In a shared network you also have to think about exclusivity, which can be achieved through encryption in transit (e.g. TLS, SSH). Exclusivity strengthens isolation through cryptography. If the data in the network is accessed by an adversary, it is not useful without the encryption key. The important part here is to use industry standard encryption algorithms and vetted implementations to avoid vulnerabilities along with mature key management. In the cloud, depending on the service, it can be a simple configuration (e.g. TLS on a private/internal load balancer) or you may also require digital certificate management (e.g. for internet properties or websites).

In short, organizations can leverage fundamental network controls and encryption to ensure that the traffic in a shared environment is isolated and exclusive for their use.

Storage

A cloud storage service (e.g. block, object, file) leverages exclusivity for isolation through encryption, which is similar to network. The data at rest encryption ensures that any access (e.g. out of band access through disk mount) does not work without the use of encryption key. There is also an additional consideration for isolation, which is access management control. It determines how the data is deleted, updated or read (e.g. rwx) including key access permissions. The data access and encryption requirements are perhaps the most talked about when it comes to data security in the public cloud.

If you use a storage platform (PaaS), you also have to determine engine/version (e.g. DB engine and version) and manage version updates as and when needed. If you use a third-party storage solution (similar to a vendor firewall), you have to implement isolation controls for the relevant cloud native compute, network and storage services utilized. Similar to compute and network, a given SaaS storage solution can be expected to have lesser management overheads compared to PaaS and IaaS.

Similar to the data at rest encryption requirements, the use of industry standards, vetted implementations and mature key management are important.

A quick note on access management and components

I have not talked about access management in general here but only for storage because of its key role in establishing isolation. I have discussed access management in part 1 of the multi part blog.

Any cloud service you choose will always have compute, storage and network components that work behind the scenes to host and manage the service. As a cloud user, you have to understand what the cloud service provides for you to use and manage, which could be compute, network or storage or a combination of some or all of them. The distinction is important to understand your responsibilities as a user and cloud provider's responsibilities. It is possible that you find deficiencies on either side because of absence of certain controls. I highlight this bit because categorizing the gaps correctly is important for risk management. I will leave you with a fundamental question, should all such risks be categorized as part of vendor risk management or only the gaps in cloud provider's responsibilities or some/none? The choices have implications because of the following reasons:

Teams involved in cloud assessments and their expertise
The rigor applied in cloud assessments and frequency of those assessments
The differences owing to complexity and diversity between cloud service providers, cloud services and potentially business use cases.

The question requires a dedicated dialogue, which I intend to cover in a future blog.

Key takeaways

Examining cloud services at fundamental layers of compute, network and storage allow us to functionally categorize the controls for those services in a structured and logical fashion. Additionally, it allows to break down a cloud service in its fundamental components with clear understanding of customer responsibilities and therefore, the control requirements to safeguard the service as a whole.

In the blog, I used the opportunity to discuss isolation through use of fundamental components however the benefits of this approach extend beyond this dialogue. As an example, any type of cloud service risk assessment or threat modeling can benefit with a component-based analysis. Further, the approach also can benefit to create a standard set of requirements that apply to standard components that form the building block for any cloud service. Analyzing cloud services at such granularity can empower an organization to structure, standardize and simplify their control requirements to a finite set. As for isolation, the story continues to improve with less for the customer to manage and secure.

Part 3

I will discuss integrity that plays at an important role in establishing trust and confidence in cloud operations. While isolation helps to achieve exclusive and private use, integrity provides assurance and validation.

References for additional details:

VMs

Container and Functions: