How to choose IT operations metrics that deliver real value

Many IT organizations focus on activity metrics such as: B. the total number of messages sent and received between a user and each application component or the total data capacity used. However, this information is of limited value as it does not relate well to the experiences of the users themselves.

The purpose of IT is to provide a satisfying experience in which the user—whether employee, customer, prospect, or partner—plays the role that the business expects. The user’s experience is the perceived quality of the interaction, including any technical element or service that contributes to it.

With this in mind, it is not surprising that optimizing IT operations also means optimizing quality of experience (QoE). This, in turn, means measuring the factors that affect user experience.

Quality of experience vs. quality of service

The original definition of QoE grew out of the network concept of Quality of Service (QoS).

QoS measures factors such as the rate of information delivered and packet loss and delay, including jitter or fluctuations. While QoS metrics are still important when measuring QoE, they are less important than application-aware metrics, which are more difficult to identify and collect.

Visibility and observability are important when collecting metrics. But much of the data available in IT and network environments is of little to no use when it comes to assessing the actual quality of a user’s experience with an application.

Read  How to make coloured eyeliner work for you

The central guiding principle of IT operational metrics is that applications should support their business case, which means they support the business. In turn, the IT teams using the applications set the quality requirements, and QoE metrics are key to objectively assessing whether those requirements are being met.

Start capturing user interactions

The first step in understanding QoE is to catalog user interactions. In order to successfully optimize IT operations, it is essential to identify every interaction a user has with an application.

When a user wants to complete a task, they start with a series of interactions that give them what they need. For example, a user can log in, make a request, make some secondary requests and updates, and then log out.

Once the user initiates this series of actions, they are bound until it completes. Anything that causes delays, confusion, or errors negatively impacts their experience—as well as the business justification of the IT resources used.

Select metrics that illustrate the user’s experience

Because improving user interactions is the foundation for optimizing IT operations, organizations should start collecting the metrics that describe those interactions at the user level.

How long does it take from the time a user tries to log in until they see the screen asking for an ID and password? How long does it take to get access after entering the credentials?

Capture the time series for each data entry. Ideally as close as possible to the user’s connection point. However, in some cases, the first event visible to IT is the time a message was received from the user and the time a reply was sent, along with the number of messages in each category.

Read  How to use conversion copywriting to increase sales, buyers

When choosing specific metrics to collect, you should aim to measure application and network performance between the user and the application. Think of the interaction as a diagram with steps, each step representing measurable work done to support the user.

Useful IT operational metrics characterize how each step performs in terms of volume – for example message count and data volume – and process time. The goal is to track how long each interaction takes and identify any points where information is lost or corrupted.

To collect the necessary data, the next step is to track the workflows associated with the interactions. Think of an application as a set of components connected at the logical level by workflows and at the physical level by network connections. Since each interaction generates a set of workflows, measuring QoE means collecting workflow data.

Whenever possible, collect metrics at both the network and application levels to enable correlation. Network-level metrics should include message count, utilization, and loss delay information at as fine a granularity as possible. The application-level data should also include message counts to help with correlation, but should focus on process times overall.

Collect and analyze IT operational metrics

Gathering the metrics required for QoE assessment can be challenging.

To collect metrics, IT teams can use network management systems, public cloud platforms, platform software tools – including container management – and logs of all kinds. However, variations in APIs and data formats can make it difficult to correlate data.

After tracing workflows and identifying associated components, identify available APIs, data formats, and any available tools to harmonize the information into a common database for analysis. Expect this process to take a lot of time and testing.

Read  How to protect yourself from tick-borne TBEV virus | Health

Tools to manage IT operational metrics

The market’s answer to the need for an organized way to manage QoE metrics is the full-stack visibility or observability product set. Many vendors, including AppDynamics and Dynatrace, offer tools that collect and harmonize QoE-related metrics, at least to some extent.

To alleviate the challenges of connecting a full-stack system to every source of metrics in your IT environment, consider using the OpenTelemetry framework, a Cloud Native Computing Foundation project. OpenTelemetry is an emerging standard related to, but not specific to, cloud computing.

The OpenTelemetry framework focuses on time-based metrics, which are critical for back-linking specific metrics with QoE. The framework’s broad vendor support doesn’t guarantee that metrics can be collected effectively from every element in your IT environment, but at least increases the odds.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button