Difference between revisions of "Procedure Infrastructure Monitoring"

From D4Science Wiki
Jump to: navigation, search
(Created page with "__NOTOC__ The monitoring of the D4Science infrastructure is carried out by Infrastructure Managers, Site Managers, [[Role VR...")
 
(gLite Nodes)
Line 27: Line 27:
 
{| border="1" cellpadding="4" cellspacing="0"  
 
{| border="1" cellpadding="4" cellspacing="0"  
 
|-
 
|-
! width="60"| !! width="120"|Site !!  width="60" |Service Availability  
+
! width="60"| !! width="100" |Site !!  width="120" |Service Availability  
 
|-
 
|-
 
| bgcolor="lightgrey" align="center"|CNR || [https://goc.gridops.org/site/list?id=336 gocdb]  [http://goc.grid.sinica.edu.tw/gstat/CNR-PROD-PISA gstat] || [https://grid-monitoring.egi.eu/myegi/sa/?view=3&facelist_values_groups=&vo=37&profile=15&site=375&graph=1 MyEGI Service Availability]  
 
| bgcolor="lightgrey" align="center"|CNR || [https://goc.gridops.org/site/list?id=336 gocdb]  [http://goc.grid.sinica.edu.tw/gstat/CNR-PROD-PISA gstat] || [https://grid-monitoring.egi.eu/myegi/sa/?view=3&facelist_values_groups=&vo=37&profile=15&site=375&graph=1 MyEGI Service Availability]  

Revision as of 17:22, 7 December 2011

The monitoring of the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. infrastructure is carried out by Infrastructure Managers, Site Managers, VRE Managers, VO Admins, and Data Managers. Such activity is done on a regular basis using the different tools provided to monitor the status of gCube , gLite nodes and Haddop nodes (check below).

In case a new problem is identified an incident should be reported immediately following the Incident Management procedure.


gCube Nodes

The monitoring of the gCube nodes of the infrastructure is based on three systems:

  • IS Monitoring: Based on information published in the gCube Information System. This information is accessible from:
  • Messaging System: Based on the information published by probes local to each node. This information is used to send emails to Site Managers when problems are found.
  • Nagios: Based on the information gathered by Nagios about the availability of each gHN. In case of problems Nagios notifies by mail the Infrastructure Managers.


gLite Nodes

There are several tools to monitor the EGI production infrastructure nodes and consequently the gLite nodes. Many of these tools share the same information source providing only different views over it. Such large number of tools cover many monitoring possibilities.

The table below provides direct links to status of the different gLite sites of the infrastructure provided by iMarine members:

Site Service Availability
CNR gocdb gstat MyEGI Service Availability
NKUA gocdb gstat MyEGI Service Availability

Hadoop Nodes

The Hadoop clusters are monitored trough the Hadoop internal monitoring and tracking systems. These tools provide monitoring for MapReduce jobs and for HDFS filesystems.

MapReduce HDFS
CNR
http://node1.hadoop.research-infrastructures.eu:50030/jobtracker.jsp
http://node1.hadoop.research-infrastructures.eu:50070/dfshealth.jsp