The infrastructure lifecycle management in the cloud - challenges and opportunities
The infrastructure lifecycle management for the cloud requires a re-examination from the lenses of federation and unitary styles and how they can be leveraged by organizations to increase infrastructure lifecycle management maturity.
Anurag Jain
9/11/20244 min read
The cloud has changed the nature of computing and brought technology out of the shadows from a support function to a business advantage. Moving fast is not only critical but a fundamental need for businesses. In the blog, I will focus on one of the often-overlooked domains of infrastructure lifecycle management and its impact to organizations. I will also discuss the opportunities and how best to move forward.
The traditional infrastructure setup and what changed in the cloud
The traditional infrastructure setup in an organization (esp. in data centers) had central infrastructure teams that focused on infrastructure service provisioning and management. In the traditional setup, application provisioning started with the central infrastructure teams that worked on hardware procurement, network and account provisioning, access management, OS install and manage the infrastructure services. An application team would then deploy their application once the infrastructure setup was complete. In the cloud, the deployment model changed with the coupling of infrastructure and application provisioning. The application developer can now deploy a complete application along with the infrastructure services without requiring support from different infrastructure teams.
The change has greatly increased the speed of deployment. The use of infrastructure as code has also standardized the provisioning of different infrastructure services (at least at a given cloud provider). In the traditional infrastructure setup, the provisioning involved different configuration constructs depending on technology and vendor (e.g. Cisco IOS for network, ESX for virtualization, Kerberos for authentication).
The change in deployment model while improved the speed, also resulted in federated infrastructure change management. In contrast, the infrastructure change ownership pivoted from federated (different infrastructure teams) to unitary (application team). These changes have implications for infrastructure services and some of the important ones include:
Federation may result in lack of standards (due to custom deployments) as efforts are spread, diverse and varied across different application teams in the organization.
Given the application teams would naturally focus on application delivery, efforts for infrastructure maturity may get deprioritized.
The application teams are less likely to have required infrastructure expertise resulting in less-than-optimal configuration/implementation.
The infrastructure architecture and implementation can lack comprehensive considerations in domains such as resiliency, availability, fault tolerance and generally, security best practices.
Over time lack of consistent upkeep can result in infrastructure vulnerabilities.
Application iterations in different environments can generate several instances of infrastructure services that may be left unused resulting in unnecessary costs while creating a potential for security vulnerabilities over time.
The key points highlight the need to have a mature governance process for infrastructure lifecycle management. And which is where the central infrastructure teams were effective in the past. In the cloud, these teams may be less involved or at times, not engaged in the infrastructure provisioning and management for applications.
To highlight with an example, infrastructure teams in the cloud (using AWS) typically would focus on specific platforms like authentication (SSO), authorization (IAM policies at organization level), OS image management (golden image) and networks (overall cloud connectivity). Additionally, the infrastructure teams may also manage organizational guardrails for IAM (e.g. SCPs to restrict certain service access) or at the network layer (prevent unauthorized access to/from internet). However, the infrastructure teams may not be engaged in resource provisioning (VPC, EC2 in VPC), permissions (IAM roles) and resource configuration (AWS S3 policy) for applications at an account level.
The infrastructure lifecycle management options in the cloud
In the cloud, the speed of deployment is one of the significant advantages for businesses. At the same time, there is a need and an opportunity to mature infrastructure lifecycle management. Therefore, any consideration needs to naturally align with the deployment speed without significantly impacting developer experience and productivity. At a fundamental level, there are at least a couple of ways to think about it:
Federated model: The infrastructure management is still federated but has infrastructure expertise embedded in the application teams to provide the necessary focus and support. The model augments the individual application teams.
Central model: The infrastructure management is centralized (like the traditional setup) and the central infrastructure teams work with application teams. The model steps up the responsibilities of the central infrastructure teams to support different application teams.
In both the options, application developers are no longer stretched thin by avoiding them to specialize in infrastructure domains and allow them to focus on their main deliverable, which is the application. Between the two options, the fundamental difference is the number of personnel. In the federated model, the personnel requirement for infrastructure expertise grows as the number of application teams or applications grow. Whereas, the central infrastructure teams can avoid a linear growth by leveraging reusable infrastructure artifacts and templates for common services. The central model is also better positioned to achieve standardization faster, which in turn benefits security by avoiding a distributed effort across diverse implementations in comparison to the federated model.
While the central model offers benefits such as cost, standardization and security, it does create a prioritization challenge (and therefore, impacts speed). The shared pool of personnel to support different application teams is limited in the central model and therefore, requires an optimal resource utilization plan. Even if such a plan exists, I expect the plan to be impacted by variable demands, some relevant examples include:
Change in technology strategy or prioritization as a result of unforeseen business demand changes, which in turn may have been influenced by fast-paced technology advancements (e.g. AI/ML in current times).
Technology debt that requires unplanned work e.g. end of life.
Technology maintenance in response to security vulnerabilities and audit findings.
Though I mentioned optimal resource utilization plan, achieving it in reality is hard because there are usually unknowns involved in any project, which include unforeseen road blocks and dependencies resulting in delays.
Looking ahead
There are pros and cons to both federated and central model. While federated model provides agility, it may not provide the best outcome for cost, standardization and security. While, the central model provides those benefits, it can impact speed. In my opinion, both models are relevant for an organization. The important question that the organization needs to ask is which model works best in what scenario and importantly, for how long? As an example, an organization can consider the federated model for development and internal applications while using the central model for production and critical applications. Organizations tend to operate in either federated or central model but usually not both and if they do, it may be accidental or ineffective. I believe that is where the opportunity lies for organizations to improve infrastructure lifecycle management by re-imaging and leveraging strategic portfolio management.
Disclosures
NO AI TRAINING: Without in any way limiting the author’s exclusive rights under copyright, any use of this publication to “train” generative artificial intelligence (AI) technologies is expressly prohibited. The author reserves all rights to license uses of this work for generative AI training and development of machine learning language models.
© 2024. All rights reserved.