Procedure Infrastructure Certification

From D4Science Wiki
Jump to: navigation, search

Different certifications procedures apply for gCube, gLite, and Hadoop nodes. In the case of the certification process we refer to nodes, cause the process is applied to the nodes hosting the resources

gCube Nodes

gCube Nodes are locally managed by a gCube service named gHN Manager. This service is part of the gHN distribution and is automatically made available when deploying the gHN distribution. The gHN Manager is the active part of the gHN being responsible of the quality of service delivered by the node. The gHN Manager includes a gHN monitoring component that periodically performs a local certification of the node. This local certification incorporates a number of tests to verify the correct functioning of the gHN. The following gHNs characteristics are evaluated:

  1. correctness of gHN profile
  2. correctness of gHN configuration
  3. existence of host certificates
  4. correctness of the connectivity with the Information System
  5. correctness of the deployment, initialization, activation, and upgrade of the services' instances hosted on the gHN.

A gHN, and consequently a gCube node, can be considered as:

  • Started: when the initialisation phase of the gHN is started
  • Ready: when conditions 1. to 3. are properly verified
  • Failed: when at least one condition among 1. to 3. is not properly verified
  • Certified: when condition 5. is properly verified meaning that all services are ready
  • Down: when the gHN is under upgrade, reboot or shutdown
  • Unreachable: when the connection with the Information System is temporarily or permanently broken, condition 4.

Any time a gHN is upgraded its certification is suspended by putting the gHN in "Down" status since it is impossible to predict the status after the upgrade. When the upgrade is completed the certification status is transmuted to "Ready", "Failed" or "Certified". The status of a node can return to "Ready" even for a failure of a local instance of a service that it is not related to an upgrade operation. For example, in a secure infrastructure, it can happen that the proxy certificate associated to the service expires and it is not possible to renew it automatically.

The possible transitions among the gHN status are depicted in the following picture. In addition, from any status and any time, it is possible to move to the Unreachable status and vice versa. This is because this status is usually associated to network (hopefully temporary) failures.

gHN status transition

Besides the normal monitoring activities, the certification information is also used by the gCube VREManager which deploys gCube services only on gHNs marked as "Certified" or "Ready". Moreover, the same information is used to measure the reliability of a node. Thus a node with a high number of "Down" status indicates a node hosting unreliable services that require frequent software upgrades and it is not the appropriate one to deploy services requiring dependable node. Lastly, the information about the status of a node can lead to reallocate instances in accordance with the node history.

The information about the certification status is accessible through the infrastructure Monitoring tool. Certification incidents are managed using Support TRAC tickets. These tickets must be created according to the Incident Management procedure. If the ticket is not closed within 5 working days the affected node can be removed from the infrastructure. The monitoring of the gHN certification is carried out by the Infrastructure Manager.

Hadoop Nodes

No certification procedure is applied to Hadoop nodes.

Runtime Resources Nodes

No certification procedure is applied to Runtime Resources nodes.