This blog is the first of a 3-part blog series that identifies challenges that enterprises face in the cost management of their cloud infrastructures. This blog covers the major challenges and makes some key recommendations. Subsequent parts propose a comprehensive cost management framework and do a deep-dive into some of these recommendations.
Cost Management Key Questions
Cloud adoption is no longer an “if” but rather a “what, when and how.” More and more enterprises are asking the questions, “What (to move to the cloud)?” “When (to move it)?” and “How (to choose the right architecture and services)?”
As enterprises move more and more workloads to the cloud, the first pain our customers feel is the sting of cost overruns. So what has happened? The budgets were planned. Some initial sizing was done. But almost immediately after a migration, costs are the first factor that start causing headaches to IT managers. In this blog, we talk about some of the pitfalls and lay out a comprehensive framework for managing costs of cloud workloads.
From a cost perspective, there are four phases of a typical lifecycle that a workload goes through:
Let’s start with the key questions that should be asked during each of these phases:
Cost Management Challenges
When starting their cloud adoption journey, enterprises sometimes do not consider the above questions, and they miss putting a cost management framework in place. This usually results in a situation commonly called the “cloud sprawl.” It means that the enterprise has lost visibility and control of its cloud landscape and costs. These situations lead to (often substantial) cost overruns. Some of the common reasons are listed below.
This is a key challenge. To utilize the full benefits of the speed and agility that cloud provides, modern IT usually provides a common services framework, wherein the business teams are allowed to manage the cloud resources for their applications themselves. While this is the recommended practice, cost ownership often falls through the cracks. We’ve seen customer situations where IT creates accounts and projects for business teams to use, and then hands them over to the business teams (but still owns the costing and billing).
What results from this arrangement is that the business teams get free reign to create resources, which they do — and often well outside of their allocated budgets. They are also neither aware and often not bothered with the mounting spends since they are not the ones footing the bill.
This is usually made worse with the fact that IT does not have strong cost reporting mechanisms to bring visibility into the who and what of the budget overruns.
Budgets and TCO
Doing an initial cloud TCO is absolutely essential to arrive at a budget for your cloud landscape. When this is not done, stakeholders have no visibility into what their infrastructure is going to cost. Cost savings is often one of the biggest reasons for cloud adoption, but not doing this exercise results in a bill shock to the enterprise and often takes the steam out of the momentum.
Even when enterprises do a TCO exercise, they often do the TCO for the final production landscape. They sometimes miss taking into account the migration plan, DevOps processes, and Go-Live dates (and also do not sufficiently size for them). This causes situations where costs skyrocket even before the application is fully migrated. Dev/Test environments tend to severely bloat up and eat into the overall budget.
Even when enterprises have done initial sizing and defined cost ownership, having day-to-day visibility into the costs is important. Because it’s very easy to create resources in the cloud (within minutes), waste becomes a concern. Resources may be created for temporary use but never shut down. We have also seen situations where hackers have obtained access to customers’ cloud accounts and created hundreds of servers. The problems with a lack of visibility can be summarized below:
- Stakeholders do not have granular visibility and actionable insights into their cloud landscape
- Continuous monitoring is not in place, resulting in month-end bill shocks
- No available projections on cloud utilization trends
Building the correct cost governance is a key pillar of the overall cloud governance framework. Problems occur when some of the following governance structures are not put into place:
- Tagging and labeling strategy is required for both automation and chargeback/showback. When a comprehensive tagging strategy is missing, it gets very difficult to do a deep dive into billing data to identify which applications resources belong to which group and who created them.
- Enterprise-level access control and provisioning policies, when not clearly defined and enforced for cloud, result in unauthorized actors creating resources. Controlling who can create which resources is essential to managing cloud sprawl.
- Cloud governance requires behavioral change across organizations. Enterprises that try to retrofit existing processes that work on-premises to the cloud will lose the advantages that come with it. On the other hand, moving to cloud without training the various stakeholders on the new governance models will also result in lapses and corresponding loss in visibility and tracking.
Even when governance models are defined, for large landscapes, enforcing governance manually comes close to not enforcing it at all (for example, imagine tagging 1,000 VMs manually). When tools and automation strategies are not used and applied across the entire cloud landscape, IT teams always play catch-up and endure a lot of manual work to keep the landscape in shape.
Similarly, when cost management and remediation tools are not used, manual compliance, cost reporting, and optimization become simply untenable and are often abandoned.
Public clouds are evolving fast. They already provide innovative features like autoscaling that are not available within on-premise environments. In addition, they provide innovative costing models and multiple discount options.
Lastly, they come up with new managed services that not only allow the customer to pay for just what they use, but also lift the management overhead for these services. Enterprises miss out on these benefits when:
- Apps don’t utilize cloud features to optimize cost (e.g., autoscaling)
- Enterprises don’t use cloud platform discounts
- Enterprises don’t do periodic reviews for validating evolving application architectures
Approach and Key Recommendations
Based on our experiences with customer landscapes and cloud best practices, we have come up with an approach that can help enterprises control and optimize costs effectively.
- Define and implement clear cloud governance model
- Provide deeper visibility and actionable insights for cost management
- Enforce governance via automation
- Enable behavioral change and discipline using a combination of the above
While the cost management framework covers a lot of ground in the following sections, here are some of the key recommendations that enterprises can get started with immediately:
- Define and implement governance and cloud provisioning methodology
- Define and implement access control and ownership of cloud resources
- Enforce resource tagging and labeling
- Build lightweight inventory management for cloud resources
- Build reporting and recommendations on:
- Cost and projections
- Clean up non-conformant resources automatically (3 stage process)
- Use reservations for production Instances
- Use spot or pre-emptible instances for Dev/Test combined with instance scheduling