Incidents are defined as “an event which is not part of the standard operation of any of the Services and which causes, or may cause, an interruption to, or a reduction in, the quality of that Service”.
An incident can be raised:
SilverStripe has numerous obligations around Incidents, and must be available 24 hours a day to quickly assess and resolve Incidents as written below.
The Stack Managers and Release Managers for each agency are emailed.
An automated notification is sent by the CWP monitoring system where it detects an outage. Manually written notifications are sent by the CWP team to acknowledge outages, provide updates, and as the means to provide incident reports.
Following a P1 incident, affected agencies shall receive:
Agencies can request an incident report for a P2 incident.
Incidents are given one of four Priority Levels (P1 to P4) based on the impact it has on the websites for an agency. Refer also Service Levels for a complete description of committments around resolution.
|Priority Level Definition||Service Level Summary|
(a) one or more of the Participating Agency’s Sites is severely impacted by the Incident, whether through availability or affected functionality; (b) no acceptable work-around is available; and (c) immediate attention is required to prevent damage to the Participating Agency’s reputation and mitigate User impact;
Note: Agencies must phone when reporting a P1.
Workaround in 30 minutes. (24/7)
Restoration in 120 minutes. (24/7)
(a) the functionality of one or more of the Participating Agency’s Sites is substantially impacted by the Incident; (b) no acceptable work-around is available; and (c) prompt attention is required to prevent damage to the Participating Agency’s reputation and mitigate User impact;
Note: Agencies must phone when reporting a P2.
Workaround in 120 minutes (24/7)
Restoration in 360 minutes (business hours)
|P3||(a) the functionality of one or more of the Participating Agency’s Sites is impacted by the Incident; and (b) timely attention is required to manage impact to the Participating Agency’s reputation and mitigate loss of User satisfaction;||Estimated resolution time to be communicated within 8 business hours.|
(a) one or more of the Participating Agency’s Sites are impacted in a minor way by the Incident although it or they can still operate while the Incident exists; (b) timely attention is required to manage impact to Users; and (c) the Incident has minor impact to the Participating Agency’s systems, operations or Users;
|Estimated resolution time to be communicated within 8 business hours.|
If a P1 Incident is widespread (e.g. all sites down), severe (e.g. security breach), extended (sites down for days), or likely to cause reputational damage (e.g. data leak), the major incident process will be followed. The first step in any major incident is for SilverStripe to assess the situation before contacting the lead agency and any affected agencies.
The major incident process is documented in full in the CWP Workspace, in the folder 'shared documents'.
An incident is known as a Security Incident when it relates to a threat or risk, or actively exploited security vulnerability, and accordingly gets prompter attention and different communication process. A security incident is always reported to the lead agency.
SilverStripe may take affected or vulnerable sites offline, in order to contain a security incident on the platform.
Where the fix involves upgrading software other than SilverStripe CMS (operating systems, for example), SilverStripe will promptly and proactively perform the upgrade. Where urgent, the Incident will be treated as an Emergency Change. (Emergency Changes can be performed at any time, with no prior notice given to agencies, however a notification is made subsequently).
Where the fix involves creating a new version of SilverStripe CMS code, SilverStripe will produce this new version and notify agencies to quickly upgrade their website. (Note that we expect agencies to upgrade their site, or contract a service provider to do this.)
If, to solve an incident, one or more, or all websites need to be shifted to another datacenter, the incident is noted as requiring a Partial Datacenter Migration or a Complete Datacentre Migration. This could be caused by major natural disaster or catastrophic software failure. This in turn will make use of Disaster Recovery capabilities in the platform (e.g. Active DR, Passive DR, and daily backups).