Back to blog results

May 27, 2019By Kevin Goldberg

Key Metrics to Baseline Cloud Migration

Cloud computing is well past the emerging stage. It’s no longer a radical idea for businesses to depend on cloud platforms and services to serve as their technology backbone--and the numbers show it.

In 2018, Forrester reported that nearly 60% of North American enterprises rely on public cloud platforms.

This year, Gartner projects that the public cloud services market will grow from last year’s $182.4 billion to $214.3 billion this year, a 17.5% jump.

At a quick pace, stakeholders from all fields and industries are seeing the benefits of cloud migration and are taking action.

Don't get punk'd in the cloud

Learn how to spot cloudwashing. Explore analyst reports, e-books, blog posts, videos and more.

Why businesses are moving to the cloud

The main challenge for technology leaders is to solve business problems while keeping costs down, and those two things make cloud computing the objective choice for the majority of business cases.

Reduced costs

Cloud migration’s cost reduction benefit is usually what initially grabs the attention of business leaders, both tech and non-tech.

Moving to the cloud allows businesses to reduce capital outlay and operating costs since resources are typically only bought as needed and only paid for when used. Businesses have access to the reliability and scalability of powerful machines at a fraction of the cost of if they were to purchase on-premise solutions.

Those who choose to stay on-premise are often forced to make unplanned outlays, especially with how fast the business and tech landscape changes. They commonly fall into the trap of not accounting for the added ownership costs which include maintenance, support, and additional hardware and software to address new requirements and to avoid hardware obsolescence.

Additionally, businesses that migrate to the cloud reap full-time equivalent (FTE) savings from requiring less internal labor for deployment, implementation, and maintenance to the percentage FTE savings due to expedited workflows, both of which translate to reduced costs.

Innovation

In the cloud, infrastructure and collaboration requirements are quickly met compared to the slow process of building on-premise IT infrastructure. The speed of deployment allows engineers to focus on building, iterating and delivering code, resulting in more deployments per year.

Because fixes and updates can be released instantaneously, being on the cloud eases the task of ensuring applications are performing reliably, meeting their service-level agreement (SLA) requirements, and constantly pushing innovation.

Measuring the success of cloud migration

Moving to the cloud is a major undertaking, whether you’re rehosting, replatforming, or refactoring. The main goals are to make sure that everything is working and that there is a categorical improvement from pre-migration. To measure the success of a migration against these goals, key performance indicators (KPIs) must be established.

Your KPIs must outline the improvements you expect and aim to achieve with cloud migration.

Here are several KPIs that operationalize cloud migration goals mentioned:

  • Both steady state and peak server utilization, as expressed as a % of pre-migration levels
  • Application availability levels (availability SLAs), as expressed as a % of pre-migration levels
  • Comparison of new metrics versus documented benchmarks pre-migration. For applications that experience usage peaks and valleys, multiple and/or seasonal baselines must be documented and established to serve as benchmarks post-migration.

Your cloud migration KPIs can be broken down into more specific metrics that you can track to ensure that you’re on your way to hitting your goals. That said, tracking metrics without establishing a baseline can and will lead you to make subjective assumptions.

To hit your cloud migration KPIs and measure the success of your migration to the cloud, here are the essential baseline metrics to track both pre- and post-migration.

Performance metrics

Response time

Response time is the amount of time it takes to perform an individual transaction or query. It is measured from the time a user’s request is sent until the moment the application indicates that the request has been completed. This metric reflects the application’s speed and performance from the perspective of users.

  • Average Response Time (ART)

The ART measures the mathematical mean of the duration of every round-trip request/response over a certain monitoring period.

  • Peak Response Time (PRT)

The PRT measures the longest request/response cycle over a certain monitoring period.

Server performance metrics

  • CPU utilization

CPU utilization measures the amount of CPU time used by an application to process a request. It is usually denoted as a percentage of CPU usage, indicating how much processor capacity is used by an application. A CPU usage that’s close to 100% requires immediate attention and action as it could be reflective of an application bug or a deficient system.

  • Memory utilization

Memory Utilization measures the amount of memory used by an application to process a request. It is usually denoted as a percentage reflecting the portion of the memory used by a process and can also be expressed as a ratio of a Resident Set Size (RSS) to the physical memory.

  • Load average

Load average is a metric that measures CPU demand, an average of the number of processes that can run over a certain time period.

Application and service availability, overall uptime

Uptime is the metric that is used to measure the amount of time a server or application is online, running properly, and accessible to end users. It can be expressed as a percentage The value is usually expressed as a percentage where the higher the number is, the better. To achieve the coveted “four nines” (ie 99.99% availability), your application can only be down about 52 minutes per year.

End-user experience

Error rates

Error rates measure the number of problem requests or HTTP status codes that indicate an error. Usually, the error rate is expressed as a percentage that denotes the number of errors against the total number of problem requests.

Error types

  • HTTP error percentage - Tracks the average number of web requests that ended in an error, including timeouts
  • Logged exceptions - Tracks the number of logged and unhandled application errors
  • Thrown exceptions - Tracks the total number of exceptions that have been thrown

Latency

Latency measures the delay between a user request and the application response. The higher the latency, the longer the delay.

Customer satisfaction scores (CSAT) and Net Promoter Score (NPS)

Customer satisfaction scores are used to measure user experience and satisfaction. When migrating to the cloud, this metric will give insight into how the migration impacted individual workflow, what friction points were added or eliminated, as well as the overall sentiment of your end users toward the migration. There are several ways to track CSAT including usability tests, feedback surveys, polls, interviews, and focus groups. NPS follows a more rigid survey process about whether your users would recommend your service to their friends and family.

Security metrics

Data exposures

Data exposures are tracked to measure the server or application’s data protection weaknesses and inadequacies.

Network I/O

Network input/output or network IO measures the average utilization of network bandwidth for all monitored network devices. Monitoring network traffic helps in detecting security issues like unauthorized access or potentially malicious traffic.

User audit

User Audit measures the number of users accessing your server or application, when they logged in and logged out, and what information and resources they accessed during each session.

S3 accessibility

Migrating to the cloud also comes with risk among managing digital permissions. While by default your S3 bucket is only accessible by the owner, you’ll need to extend your S3 permission to your greater team and in some instances third-party services. Maintaining a careful watch over these permissions is key to maintaining data integrity.

Indicators of compromise (IOCs)

IOCs are unusual activities that raise red flags for potentially malicious activity on a network, server, or application. Security analysts track IOCs to detect unwanted activity early on in case of an attack or security breach.

External services and collaborators

Tracking external services and outside collaborators helps security analysts evaluate security policies, detect IOCs, and improve controls that cover collaboration and outside access to sensitive and protected content.

Kevin Goldberg

Kevin Goldberg

Kevin is the senior technical content manager at Sumo Logic. He has nearly a decade of experience working at high-growth SaaS companies with a focus on IT software previously working for AppDynamics and SolarWinds. Interested in all things tech and sports, you can follow him on Twitter @kevin_goldberg.

More posts by Kevin Goldberg.

People who read this also enjoyed