Use your Scheduled Downtime (pt 1)

Use your Scheduled Downtime (pt 1)

Part 1 (part 2 is here)

Service Level Agreement

There are many things that you can never avoid in the real world on your production environment. Among other things, this is downtime (or outage).

Even if you running high availability architectural solution, then this is just a high, but not a continuously available solution. And outage is what you should expect and be ready for.

When you provide services you have a set of Service Level Agreements (SLA). In general, these are contracts between you and the users of the system or service. In fact - the obligations of the parties, describing what the user is paying money for in terms of the quality of the service and that what the service provider is obliged to do to ensure the required level of service.

Besides SLA is not only between you and users, they are (I hope) and between your services. Few words about this later.

In terms of business, one of the most critical is the SLA for uptime (or availability). This is the percentage of time in a year (quarter, month, etc.) during which the service will be guaranteed to be available to the user.

In some cases uptime and availability may not be synonymous. For example, a service can be up, but for a number of reasons it is not available to the user or to user target audience. It is most reasonable to determine the availability or uptime of a service (or a system as a whole) in terms of performing functional tasks. Therefore, the user and operator must unambiguously understand what is meant by these terms. These terms are then used as synonyms for some simplification.

Downtime is the percentage of time that a service may be unavailable. In the vast majority of cases, this is not an obligatory value. Usually this is seen, as: "The service will be inaccessible for as maximum as this, but we will do everything possible to make downtime less."

An important question what will be considered as downtime and what will not. As well as how downtime will be measured. For example, you have a solution which contains of some core, around which 5 services are “spinning”. Will downtime be considered if one of the services falls? And what if 2 will fail (or 3…)?

The key to the answer is the dependencies between services, and how the downtime of one of them will affect the solution as a whole. In particular, it is necessary to monitor the availability of each of the services. And to determine what will be considered as downtime for each of them separately.

As example, one of the options for determining a downtime of solution is to determine the set of basic business flow. Which flows will be considered as a major depends on the context. This can be the most commonly used flows; critical from the functional point of view; critical in terms of "white gloves customers", etc... And when major flows are failed it's downtime. Situation when one or more of the services, that does not affect the solution as a whole, is unavailable, can be considered as a Severity 1 incident. Which, however, requires an immediate actions. Once again it's all depends on context.

Accordingly, Service Level Agreements must be determined for each of the services (their values may vary depending on the service). Also there should be an SLAs between services. Thinks about this as about team work. If your team members doesn't fulfill the agreements among themselves, how can you match the contract to your customer. And SLAs between services must necessarily correlate with the "customer faced" SLAs and with the SLA on the entire solution as a whole.

要查看或添加评论,请登录

Leonid Yashchuk的更多文章

  • The right solution depends on the task at hand

    The right solution depends on the task at hand

    Back in the day, as a Delivery Manager, I had the opportunity to lead a project focused on migrating and upgrading an…

    1 条评论
  • Use your Scheduled Downtime (pt 2)

    Use your Scheduled Downtime (pt 2)

    Part 2 (part 1 is here) How customers think about availability: Availability (%) = 100% (Whole time per particular…

  • When Release is released?

    When Release is released?

    How often do you hear the question: When will this release will released? You will not believe how many people think…

社区洞察

其他会员也浏览了