Risk Analysis and Risk Response
|
Introduction
The goal of the risk analysis and risk response activity is to provide the consortium with guidelines and instruments for managing the actual and potential risks that can occur during the project's lifetime. The full risk management procedure and methodology is available at DNA1.1.
Risk Management Methodology
Overview
A risk management methodology is a set of methods and procedures used to identify both the risks the project is subject to as well as the actions to take in order to identify their happening and consequently react to minimize their effect.
The methodology consists of several steps; These are the risk identification, classification, monitoring and resolution.
- Risk identification: It identifies the risks that the different parts of the project (i.e. its assets like the developed or deployed system) are exposed to and evaluates each risk by attaching qualitative and quantitative attributes to the risk, leading to subsequent quantification of the impact that the risk will have, the probability of occurrence, and the value of the assets
- Risk classification: It identifies the most important risks and promotes in subsequent steps the actions to be taken to safeguard the assets. The prioritization of risks attempts to handle first the risks with greatest impact on the project outcomes and greatest probability of occurrence, and last the risks with lowest impact on the project outcomes and lowest probability of occurrence.
- Risk monitoring: It identifies the procedures to monitor the risks according to the priorities identified in the risk classification phase.
- Risk resolution: It identifies the strategies to reduce the probability of occurrence of a risk or the countermeasure needed to limit its effects.
Risk Evaluation
The evaluation of a risk is performed by identifying the probability of occurrence and the impact for each risk. The probability for a particular source/problem to occur is not a strictly mathematical probability factor. For the majority of the risks there are no formulas or there is not enough experimental data to calculate the probability of occurrence. Thus it is not easily quantifiable. The impact measures the damage that will be caused to the object element in case of occurrence of the risk.
This case of difficult evaluation of the basic metrics of risk evaluation is actually typical in IT projects where probabilities are estimated by indirect methods such as “expert” opinions, offers, negotiations etc. In the iMarine case, this activity relies on “expert” opinions that evaluate the risks. Moreover, the terms “probability rank” and “impact rank”, which are more appropriate for the iMarine case, have been adopted. Probability rank liberates the analysis from the strict mathematical terms, which in any case is not objectively useful in this
Probability Rank | Impact Rank | ||
---|---|---|---|
Description | Value | Description | Value |
None | 0 | None | 0 |
Very Low | 1 | Doesn’t affect the activity | 1 |
Low | 2 | Affects the activity but a workaround is not needed | 2 |
Medium | 3 | Affects the activity and it's recommended to put in place a workaround | 3 |
High | 4 | Affects the activity and it's mandatory to put in place a workaround | 4 |
Very High | 5 | Affects the activity that has to be completely rethought | 5 |
Certain | 10 | Blocks the activity | 10 |
Risk Identification
The first part of the risk analysis phase is about identifying and labeling the risks the project is exposed to. The identification of a risk is based on the usage of the terms source/problem and is further explained by the terms object/impact
- The source/problem can be anything external or internal to the project that behaves outside the margins it is expected to behave. Typically these margins should be settled by the specifications, however these are not those that are of interest to our risk management, but rather the margins on which the plan of the project has been settled on. Multiple sources logically combined can form one risk
- The object/impact is always internal to the project. Object can be any element that is affected by a source/problem while the impact is the problem raised in the object, expressed in qualitative and/or quantitative manner.
Because of their knowledge of the domain, work package leaders and project managers, in cooperation with tasks leaders when required, are the best candidates to identify possible risks affecting the activity of the project. Work package leaders identify low-level risks and their impact, while project managers identify high-level risks that are not directly conceivable at lower layers.
In order to identify risks they should:
- Evaluate the applicability of common risks proposed by various methodologies. Subsequently perform a fine-grained extension of the common risks to the elements of the project that comply with the risk definition
- Analyse the methodology commonly used by the target communities and evaluate the distance between their approaches and the ones proposed by the project
- Enumerate all the dependencies of software components and work plans at the task level and enhance this information with the effects caused by the event of failure, delay, misbehaviour (lack of features, performance, etc.)
After the identification of the risks, a number of additional steps have to be performed:
- Identification and removal of duplicated risks
- Homogenisation of the terminology
- Sorting of risks according to the source/problem. Multiple effect source/problem can be grouped in one element with multiple objects/impacts
- The identified risks are grouped into 3 different categories:
- Networking Activities Risks
- Service Activities Risks
- Research and Technological Development Activities Risks
- As an initial indicator, the likelihood of appearance of the risk can also be attached.
The risk identification is a continuous task. When work package leaders or project managers decide to declare a new risk this should be done either immediately or during the next monthly activity report. If no risks are identified this should be communicated to the QATF when the Quarterly Report is produced.
- The declaration of new risks is carried out by creating a new ticket with type “risk” using the TRAC web interface available at:
https://issue.imarine.research-infrastructures.eu/newticket
Risk Classification
Risk classification is the main task of the risk analysis phase. Having:
- the value of the asset associated to the risk
- the indicator of the probability of the risk being triggered
- the impact this will have on the particular asset
it is possible to estimate the importance of the risk.
The measurement of risks is typically called risk exposure. Since, the risk exposure is mathematically calculated as the product of probability by impact, it will not be used.
Instead the “risk exposure ranking” will be measured to classify risks:The value of the asset can be defined as follows:
- All assets that do not depend on other assets
- Value of asset = 1;
- All assets that depend on other assets
- Value of asset = C-Value
- C-Value = K * Value of lower level asset (K <= 1; e.g. K = 0.9 and so on to reduce the value of the assets that depend on a chain of assets)
Two approaches are recommended to sort the classified risks:
- Sort the risks by Probability Rank. This allows focusing on the risks most likely to happen and then investigate the chains they are taking place.
- Sort the risks by Risk Exposure Rank. This captures most serious problems that can affect the asset and then investigate the related events.
Some of the identified risks could belong to different categories, having different probability and impact values on each category. For those risks we keep the maximum exposure rank.
Risk Monitoring
The most effective way of monitoring risks is the continuous update of the top-ranked risks as defined in the risk plan. Among the top-ranked risks it is important to implement the actions of the risks that entered the list since the last evaluation. It is also required to update the status on all other risks. For this purpose, for each risk the following information is also maintained:
- Current position in the top-n ranking
- Previous position in the top-n ranking
- Risk description
- Progress towards resolution
The risk monitoring is a continuous task. However it is mandatory to follow the risk plan status every time the QAFT re-sorts the existing risks to understand which risks are now top-ranked and require therefore close monitoring. If a risk increases its exposure rank, the projects managers should introduce more accurate countermeasures and inform the PMB to allow a high-level discussion. Similarly but with the reverse impact, risks that gradually diminish, have their countermeasures relaxed.
Risk Resolution
For the resolution of risks, a number of methods can be applied:
- Avoid the occurrence of the risk by reducing the probability of its triggering events
- Avoid the risk by removing its connection with project activities
- Transfer the danger to another party or asset to reduce the probability of occurrence
- Acceptance of the risk and implementation of its countermeasure
- Acceptance of the risk with late reaction
- Exploitation of risk side effects to balance their impact
Identified Risks
The following tables contain all the identified risks. These risks are either identified by the QATF, or by the Work and Task Package leaders. They may have been reported using the TRAC system, the monthly activity report or can be directly reported through this page.
When adding a new risk the following information should be provided:
- Risk: The name of the risk
- Description: A brief description of the risk
- Risk Probability: The probability of the risk using one of the values described at the Risk Evaluation section
- Risk Impact: The impact of the risk using one of the values described at the Risk Evaluation section
- Work Package: The Work Package(s) that the risk belongs to
- Related Ticket: The Track ticket number pointing to the identified risk
- Countermeasures: The possible countermeasures to be taken if the risk occurs
- Applied Countermeasures: A brief description of the applied countermeasures and/or the related ticket numbers
Networking Activities
Risk | Description | Risk Probability | Risk Impact | Work Package | Related Ticket | Countermeasures | Applied Countermeasures |
---|---|---|---|---|---|---|---|
High staff turnover | Given the complexity of the iMarine environment, skilled staff may leave the project for longer-term and higher paying positions within industry The virtual nature of the organisation may increase the probability of this risk |
2 | 3 | All | - | -Project management should verify training plans for the younger researchers/developers to ensure that they continuously evolve within the project. -Plan two or three occasions per year where all project staff can get together. -Consider staff satisfaction review across the project |
|
Lack of organisational coherence | Could be caused by one or more specific anderlying reasons ranging from communication difficulties at one or another of the member organisations | 2 | 3 | All | - | - Minimise the risk occuring by implementing a project structure in which the key main stakeholders are identified at this point in time -Ensure the consensual values within the project, the effective use of communications technology and the frequent planning control and review. |
|
Serious disputes between consortium members | 2 | 3 | All | - | -Aim to minimise the chances of disputes occuring by ensuring regular and clear communication between consortium members -Work package leaders should aim to follow an attitude of openess and trust, wehrever possible |
||
Multi-disciplinary nature of the consortium may lead to disciplines working in silos | Lack of communication, limited understanding of the needs and difficulties in testing/feedback | 2 | 3 | All | - | -Work package leaders must ensure reglar presence at the quarterly face-to-face meetings of the PEB in order to prevent "silo" work packages -Regular communication thourgh virtual means will prevent isolation |
|
EA-CoPCommunity of Practice. technologies becomes obsolete | The CoPCommunity of Practice. employs a wide variety of technologies, often released many years ago. They may need to change them to become a viable partner. | 4 | 2 | All | - | -The gCube services do not depend on these external facilities. If software needs upgrade, this is beyond the project scope. However, the data infrastructure may offer solutions. | |
EA-CoPCommunity of Practice. Software is not released on time | This risk is very common in any project with a collaborative plan of development activities. Late delivery only affects EA-CoPCommunity of Practice. activities. | 3 | 3 | All | - | -The Agile development approach provides opportunities to assess the direction of the ongoing activities, and mitigate the impact -Appropriate boards within the project will continuously monitor this risk and take corrective actions |
|
Multi-disciplinary nature of the consortium may lead to disciplines working in silos | Lack of communication, limited understanding of the needs and difficulties in testing/feedback | 2 | 3 | All | - | -Work package leaders must ensure reglar presence at the quarterly face-to-face meetings of the PEB in order to prevent "silo" work packages -Regular communication thourgh virtual means will prevent isolation |
|
Lack of motivation and/or participation in the Virtual and the Project Events | Recruiting both appropriate speakers and participants to attend this event involve skilled staff with experience in developing marketing and promotional strategies as well as a vast network database of competent names which focus on benefits of the individual attending. | 3 | 3 | All | - | -The iMarine partners boast high-level networks specific contacts which will be contacted according to the specific event organised. The partners have proved over the last 2 years are able to obtain active participation from key experts in the field. The involvement of the existing CoPCommunity of Practice. community and its related contact network will also help mitigate this risk | |
Project software is not delivered on time or misses specifications | NA3 translates the EA-CoPCommunity of Practice. in development goals that cannot be achieved by RTD for any reason | 3 | 3 | All | - | -Representatives of RTD will be included in the NA3 work package by assuring the feasibility of the goals according to RTD requirements and effort. | |
Difficulty increasing users’ participation to engage in iMarine training activities | To accelerate the adoption of the e-infrastructure governance model, to create awareness of the services provided, etc., users‟ participation must increase. | 3 | 3 | All | - | -The introduction of virtual tools is a proven mechanism for engaging and involving users in training activities reducing costs and the learning-implementation process as users may have their individual timeframe to complete courses | |
Expected co-funding / collaboration is not achieved | The projects supporting external applications development are delayed or cancelled, and the in-kind inputs from partners are not delivered as promised. The risk is medium since the core parts of the use cases are designed to be tackled with project funds or external funding sources already approved. | 3 | 2 | All | - | -The iMarine Board will mostly have to face priority settings, so efforts will be reported on areas where resources are guaranteed -The Steering Board should make decisions in the best interest of the EA-CoPCommunity of Practice. |
|
EA-CoPCommunity of Practice. Policy expectations are too diverse for being consistently developed for iMarine | Too many expectations are formulated, the opinions of different members are too strongly diverging on the solutions. The risk is real but has been minimized by attempting to being together actors of the EA-CoPCommunity of Practice. which share common interests and standards | 3 | 4 | All | - | -The iMarine Board will create clusters of common interest in order to facilitate negotiation; it will also have flexible strategies for the development of policies, bottom-up or top down or a mixture of both depending on the complexity. It will be aware that progress on policies will be uneven depending on the topics |
Service Activities
Risk | Description | Risk Probability | Risk Impact | Work Package | Related Ticket | Countermeasures | Applied Countermeasures |
---|---|---|---|---|---|---|---|
Unavailability of dedicated computing and storage resources | The computing and storage resources provided by the project partners represented in WP5 are not made available | 1 | 3 | 5 | - | The required amount of resources can be acquired from external cloud providers like MS Azure, thanks to gCube cloud extension and the collaboration with Microsoft which will give us enough computing and storage capacity if needed. | |
Unavailability of on-demand computing and storage resources | The computing and storage resources provided on-demand as cloud resources in WP5 are not made available. Alternatively, the gCube extension to access clouds resources is not operational | 3 | 3 | 5 | - | The required amount of resources will be discussed with the project partners to understand their availability to provide more dedicated nodes | |
Impossible to access resources from other infrastructures | The resources provided by other (data) infrastructures (from the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Infrastructure and others) are not reachable and cannot be consumed from the iMarine Data e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. | 3 | 3 | 5 | - | The Service Activity teams of both infrastructures establish direct communication to analyse the problem. If required, the defined interoperability solutions are updated and the development teams involved | |
Useless procedures | The identified procedures to manage the Data e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large., the VREs, and the software release are useless and introduce delays | 2 | 3 | 5-6-7 | - | -The reasons for the misalignment between procedures and daily practices are analysed and improvements to the current procedures proposed and tested -The advisement from external experts may be established |
|
Unclear or unstable requirements | The requirements and concrete use cases identified by the EA-CoPCommunity of Practice. are not clear or unstable and do not allow the definition of appropriate Virtual Research EnvironmentA ''system'' with the following distinguishing features: ''(i)'' it is a Web-based working environment; ''(ii)'' it is tailored to serve the needs of a Community of Practice; ''(iii)'' it is expected to provide a community of practice with the whole array of commodities needed to accomplish the community’s goal(s); ''(iv)'' it is open and flexible with respect to the overall service offering and lifetime; and ''(v)'' it promotes fine-grained controlled sharing of both intermediate and final research results by guaranteeing ownership, provenance, and attribution. | 3 | 4 | 6 | - | More regular and face-to-face meetings between the EA-CoPCommunity of Practice. members and the technical teams are established to promote a clear communication between the two teams and a detailed discussion of the requirements | |
Low quality of the delivered community tools and gCube common tools | The applications and tools developed by the user communities and/or the gCube common tools are not deployable or are of low quality | 3 | 3 | 6 | - | The effort on the development support task (T6.4) is intensified to allow a better communication and support to the developers of these applications and tools | |
Unavailability of data resources | The data resources planned for the different use cases are not delivered and made available by the user communities | 3 | 4 | 6 | - | Direct contact with the user communities and increased support is put in place to understand the reasons for the unavailability of the data resources. The policies for data provision may be revised | |
Limited or unavailable VREVirtual Research Environment. functionality | The identified VREVirtual Research Environment. functionality is not provided or do not satisfied the initial requirements | 2 | 4 | 6 | - | Define very clearly the planned VREVirtual Research Environment. functionality. Push developers to deliver early prototypes and users to provide early feedback | |
Unavailability of build and testing tools | The tools defined to integrate and test the project source code are not made available | 10 | 10 | 7 | https://issue.imarine.research-infrastructures.eu/ticket/1040 | -Other instances of the same tools are exploited (e.g. an ETICS instance is hosted at CERN and run by EMI project, however Engineering (E-IIS) will also install an instance in its premises) -The usage of other tools is considered |
The ETICS infrastructure hosted at CERN is planned to be shutdown at the end of April 2013. Therefore members of WP7 have planned to upgrade the ETICS instance at Engineering and perform the migration of the integration activity on that instance during the Q1 2013. The start of the migration activity is planned for January 2013 and will give enough time to complete it on time. |
Possible security issues on the production infrastructure. | End of Public updates of the the Java SE version 6 | 10 | 4 | 5-7 | https://issue.imarine.research-infrastructures.eu/ticket/1041 | The public updates for JAVA SE v 6 , which is the base of our infrastructure nodes, will stop on February 2013. The exploitation of JAVA SE 7 should be planned ASAP in order to avoid possible issues in the production environment. | The WP5 and WP7 teams have discussed the issue and decided to start from Jan 2013 to integrate the gCube software using JAVA SE 7, in order to understand possible incompatibility issues . The WP5 members have also planned during Q1 2013 the migration to JAVA 7 of all the infrastructure nodes. |
Research and Technological Development
Risk | Description | Risk Probability | Risk Impact | Work Package | Related Ticket | Countermeasures | Applied Countermeasures |
---|---|---|---|---|---|---|---|
Foundation Technology becomes obsolete | The gCube system is build on technologies released a few years ago. It may be needed to change them to maintain a state-of-the-art status of gCube | 5 | 3 | All RTD | - | gCube services do not deal directly with these underlying facilities. gCore framework was impements to isolate the services from the underlying layers. This layer can be evolved to minimise the impact of the risk over the services | Due to the high probability of this risk a new component, named "FeatherWeight stack" has been introduced to mitigate this risk. Related ticket is: #853 |
Software is not released on time | This risk is very common in any project with a consistent plan of development activities. Instances of the risk highly affect all the other work packages' activity | 3 | 4 | All RTD | - | -The Agile development approach adopted in RTD will provide many opportunities to assess the direction of the project through incremental and iterative work cadences and short integration cycle -Appropriate boards within the project will continuously monitor clues of this risk and take corrective actions |
|
Community of Practices cannot be implemented | NA3 translates the CoPCommunity of Practice. in development goals that cannot be achieved by RTD for any reason | 2 | 3 | - | Representatives of RTD will be included in the NA3 work package by assuring the feasibility of the goals according to RTD requirements and effort | ||
EMI fails in its goals or to deliver software suitable for the project’s purposes | The European Middleware Initiative may or may not fail to maintain the gLite and ARGUS software currently exploited by the gCube system. Due to the past experience of the participants, the probability of the risk is very low. | 1 | 2 | - | -There are very few and well-identified points of contact between the gCube system and the gLite software. They can be changed with a low impact to interface other systems offering computing and storage capabilities -Hadoop clusters will be available to compensate |
Top Identified Risks
This section will contain the top 10 of the identified risks based on their Risk Exposure Ranking.
- Last update: 12/20/2012
# | Risk | Risk Probability | Risk Impact | Risk Exposure Ranking | Previous position in the top list | Category |
---|---|---|---|---|---|---|
1 | Unavailability of Building tools | 10 | 10 | 100 | - | SA |
2 | Possible Security issues on the production infrastructure | 10 | 4 | 40 | - | SA |
3 | Foundation Technology becomes obsolete | 5 | 3 | 12 | 3 | RTD |
4 | EA-CoPCommunity of Practice. Policy expectations are too diverse for being consistently developed for iMarine | 3 | 4 | 12 | - | NA |
5 | Unclear or unstable requirements | 3 | 4 | 12 | - | SA |
6 | Unavailability of data resources | 3 | 4 | 12 | - | SA |
7 | Software is not released on time | 3 | 4 | 12 | - | RTD - NA |
8 | Lack of motivation and/or participation in the Virtual and the Project Events | 3 | 3 | 9 | - | NA |
9 | Project software is not delivered on time or misses specifications | 3 | 3 | 9 | - | NA |
10 | Difficulty increasing users’ participation to engage in iMarine training activities | 3 | 3 | 9 | - | SA |
NOTE: In the category section the allowed values are:
- Networking Activities -> NA
- Service Activities -> SA
- Research and Technological Development -> RTD
Risk Resolution
In this section the corrective actions put in place when a certain risk from the top-ranked list occurred during the project lifetime. Resolution actions are specific for each risk and thus a per-risk description is provided. It is important to notice that these corrective actions have a cost and that the diverse corrective actions/procedures with different costs can be put in place to attack the same risk. The one reported below has been identified by carefully evaluating various aspects including the impact of the risk, the cost of the actions and the characteristics of the context, i.e. the iMarine project. Moreover, these procedures can partially resolve/mitigate the risk or generate another risk.
The section will be updated periodically based on the above list and the risks' occurrences
- Foundation Technology becomes obsolete
Due to the high probability of this risk a new component, named "FeatherWeight stack" has been introduced to mitigate this risk. For more information about the new component check the related ticket: #853 and its wiki page at: FeatherWeight Stack wiki