Service Availability: Calculations and Metrics, Five 9s, and Best Practices BMC Software Blogs

Additionally, some 3rd party vendors write their ‘best guess’ of what Dell Technologies encodes into our branded optics in an attempt to trick the Dell Technologies firmware. Note that Dell Technologies support will recognize these fake C&O and will not support them. Dell Technologies takes Cable and Transceiver (C&O) support with the level of focus and severity warranted as a key part of the Dell solution. If a cable or optical solution refuses to enable or fails during operation – that is an event that can have significant consequences. All Dell Technologies labelled offerings from the Networking catalog are recognized and reported as “Supported”.

Duration of warranty covering repair or replacement of defective hardware. Includes problem diagnosis with Dell Technologies technical support professionals by phone, email or chat during normal business hours to help resolve hardware issues covered under the Basic Hardware Service contract. Service does not include software set up or configuration assistance, or other advanced technical service and support provided by Dell Technologies Services ProSupport Enterprise Suite. The official definition of Availability in ITIL is the ability of a configuration item or IT service to perform its agreed function when required.

«Powers of 10» trick

Most business continuity strategies include high-availability, fault-tolerance, and disaster-recovery measures. These strategies help the organization maintain essential operations and support users when facing any type of critical IT failure, small or large. Unlike high availability, delivering high-quality performance is not a priority for fault tolerance. The purpose of fault-tolerance design in IT infrastructure is to prevent a mission-critical application from experiencing downtime. Both high availability and disaster recovery are important for enhancing business continuity.


This measure extends the definition of availability to elements controlled by the logisticians and mission planners such as quantity and proximity of spares, tools and manpower to the hardware item. A holistic view is required as there are countless availability risks in the ITSM domain, such as expired certificates, poorly planned configuration changes, human error, and vendor-related failures, among others. The official definition of the Availability in ITIL is the ability of a configuration item or IT service to perform its agreed function when required. For instance, expecting a search service to scan entire databases and return results in 100 milliseconds, or waiting to get an output from a service for a predetermined input are examples of functionalities of services.

The objectives of the Availability Management process

I used google search built in calculator for these quick calculations so perhaps there’s a chance that there’s some precision/floating point/numerical issues messing with these figures. I double checked the calculations using wolfram alpha and it appears to be the same. They just seem like two different ways of quantifying the same concept of «availability». Stack Exchange network consists of 182 Q&A communities including Stack Overflow, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers. «The acronym RAS came into widespread acceptance at IBM as the replacement for the subset notion of recovery management.» Virtual machines to decrease the severity of operating system software faults.

  • However using the second formula its based on AGREED uptime is a simple percentage of uptime versus downtime.
  • The following table shows the translation from a given availability percentage to the corresponding amount of time a system would be unavailable.
  • They just seem like two different ways of quantifying the same concept of «availability».
  • The High Availability describes systems that are dependable enough to operate continuously without fail.
  • Failure is only significant if this occurs during a mission critical period.

Known as five nines reliability, the system is essentially always on. High-availability software solutions typically provide load balancing and redirecting, automatic application failover, real-time file replication, and automatic failback capabilities. High-availability clusters are servers grouped together to operate as a single, unified system. Also known as failover clusters, they share the same storage but use different networks.

Application Availability

However, restarts can take several minutes resulting in lower availability. Additionally, cloud services cannot detect software failures within the virtual machines. High Availability Software running inside the cloud virtual machines can detect software failures in seconds and can use checkpointing to ensure that standby virtual machines are ready to take over service. Service Availability involves all aspects of service availability or unavailability.


Since the availability management plan aims to meet the agreed service level targets from an availability point of view for all delivered services, availability or unavailability of services are covered. Also, the potential impact of component unavailability on service availability is covered in the scope of the Availability Management process. In a service chain, if there are components that are used to deliver a service to a customer, any impacts on the regarding components will affect the service delivery quality. Thus, component unavailability impact is assessed under availability management process scope as well. The last objective of the Availability Management is assessing the impact of all changes on the Availability Management Plan.

Measurement and interpretation

Availability refers to the ability of the user community to obtain a service or good, access the system, whether to submit new work, update or alter existing work, or collect the results of previous work. If a user cannot access the system, it is – from the user’s point of view – unavailable. Generally, the term downtime is used to refer to periods when a system is unavailable. Another factor that impacts system availability is maintainability, which refers to how quickly technicians detect, locate, and restore asset functionality after downtime.


Note that, it is Availability Management Best Practice to calculate availability using measurements of the business output of the IT Service. This means that the measurements on the customer or business should be taken into account when making calculations. This is because it is the end of the tunnel where the end user uses the service and where IT Service provider aims to meet the agreed service quality.

What’s An App Owner? Application Owner Roles & Responsibilities

In redundant systems, the crossover point itself tends to become a single point of failure. The timely, reliable access to data and information services for authorized users. As defined in FISMA, the term ‘availability’ means ensuring timely and reliable access to and use of information. If unplanned downtime makes up the lion’s share of total downtime, you can start to analyze what is causing this unplanned downtime.

Each part of the term reliability, availability and serviceability describes a specific type of performance for computer components and software. Availability refers to the percentage of time that the infrastructure, system, or solution remains operational under normal circumstances in order to serve its intended purpose. For cloud infrastructure solutions, availability relates to the time that the data center is accessible or delivers the intend IT service as a proportion of the duration for which the service is purchased.

Identify Problems for Incident Management Using AI/ML

Because availability is so tied to the financial health of a company, it is commonly used as a key business metric in production-heavy organizations. However, it’s also heavily connected to what several other departments do, including maintenance. Availability is impacted by reliability and maintainability, which are influenced by the processes and tools of the maintenance team. Therefore, availability is used to measure and investigate the effectiveness of these processes and tools, and how they can be improved.

Deja un comentario