Resilience Planning & Management Essentials

Enquiry
Programme Code D28
Domain
Applications Management
Software Engineering
Level
Intermediate
Learning Partner(s)
NUS-ISS
Duration
2 Days
Format In-person
Rating
Competencies
Application Management Technical Operations Management
Job Roles
Project Manager (Agile) ICT&SS Professional Digital Service Manager

Overview

This 2-day programme (formerly, Operations Management Essentials for Application Services) will cover the essentials of real-world resilience planning and operations management to ensure the smooth running of production applications. It will include proactive best practices to avoid application incidents (whether big or small), and best practices to quickly resolve and minimise the impact should incidents occur.

With the growing dependence of organisations on IT applications for critical business functions, effective resilience planning & management is increasingly important, as incidents can have major widespread adverse impact (e.g. highly publicised system issues for some Singapore organisations, such as banks and even government agencies, in recent years).

The programme will be helpful to all ICT and SS staff whose work will impact BAU SLAs (e.g. availability, online response time etc) and who will be affected by any incidents that occur – e.g. product/project managers, business analysts, solution architects, development and ops managers, development teams, operations teams, DevOps teams etc.

The programme will start with real-world best practices to manage and resolve incidents quickly and effectively (even for difficult incidents such as Circle Line signal system incident which was only finally resolved effectively by one party, after unsuccessful attempts by many other experts from diverse organisations over a 2-month period).  

More importantly, the programme will then progress to more proactive resilience management methods that will help reduce the occurrence of incidents in the first place – for example, aspects covered include how to prepare the system and processes so as to avoid high-impact issues such as:

  • poor performing systems due to high transaction volumes e.g. during popular launches (e.g. highly-anticipated new government programmes that many citizens rush to sign up upon launch – similar to concert bookings for popular international stars and the related publicised system failures which we wish to avoid). 
  • unavailability of systems e.g. due to poor design or processes that causes the system to go down.
  • high-impact functional errors e.g. due to poor system change management that allow such errors to get into production.

These are the key resilience issues that we commonly find publicised and hence, are very critical to avoid.

The resilience management practices covered will include international best practices such as those adopted by Google, Netflix and also lessons learnt from high-impact failures from commercial organisations and other governments etc.

The programme will also touch on how to properly utilise latest practices such as DevSecOps to help enable some of the proactive practices to benefit yourself – i.e. make it more automated, streamlined, complete, thorough, and efficient.

This includes key pointers on how to properly and optimally use tools already available in the government -- such as Ship-Hats, StackOps, Elastic Agent, Service Maps, Selenium etc. – to help yourself to greatly improve resilience, and make resilience management more automated, streamlined and effective.

(Note for those who have taken ITIL programmes, this programme is also a good complement as it extends the ITIL learning with linkage to current real-world industry best practices.)

Some comments from past participants for this programme:

  • “All the content is applicable to my day to day work. It helps to ground my knowledge and understanding of practical ops management.”
  • “Chance to learn many new operations-related concepts for future projects.”
  • “The contents were great - so much covered in a short time.”

This programme will cover the following topics:

  • Reactive and proactive management of application services
  • Practical methods to manage Incidents, Events and Problems 
  • Practical methods for Change and Release/Deployment Management so as to have smooth system changes and avoidance of incidents
  • Practical methods for System Configuration Information Management that will facilitate speedy recovery from incidents as well as proactive prevention of incidents
  • Practical methods to optimise Availability and Capacity to proactively reduce/avoid issues of system unavailability or system performance problem (that usually cause negative publicity for the system)
  • Practical methods to optimise Service Level Management and Supplier management to proactively keep service levels up and customers/users satisfied

Fees


Full Fee

Full programme fee

S$1200

9% GST on nett programme fee

S$108

Total nett programme fee payable, including GST S$1308

With effect from 1 Jan 2024

NOTE
Payment for this programme is to NUS-ISS, National University of Singapore.

Upcoming Classes

Class 1
07 Jul 2025 to 08 Jul 2025 (Full Time)
Duration: 2 days
When: Jul - 07, 08
Time : 9:00 AM to 5:00 PM
Class 2
12 Jan 2026 to 13 Jan 2026 (Full Time)
Duration: 2 days
When: Jan - 12, 13
Time : 9:00 AM to 5:00 PM

How To Register


Agency-sponsored

Step 1 Apply through your organisation's training request system.

Step 2 Your organisation's training request system (or relevant HR staff) confirms your organisation's approval for you to take the programme.

Your organisation will send registration information to the academy.

Organisation HR L&D or equivalent staff can click here for details of the registration submission process.


Step 3 GovTech Digital Academy will inform you whether you have been successful in enrolment.



Testimonials

All the content is applicable to my day to day work. It helps to ground my knowledge and understanding of practical ops management. 

,

The programme provided me with a better structure to organise and articulate practices in operation management.

,

The different incident and change management techniques and principles learnt were useful.

,