Difference between revisions of "Procedure Infrastructure Monitoring"
Andrea.manzi (Talk | contribs) (→gCube Nodes) |
Andrea.manzi (Talk | contribs) |
||
Line 1: | Line 1: | ||
__NOTOC__ | __NOTOC__ | ||
− | The monitoring of the D4Science infrastructure is carried out by [[Role Infrastructure Manager|Infrastructure Managers]], [[Role Site Manager|Site Managers]], [[Role VRE Manager|VRE Managers]], [[Role VO Admin|VO Admins]], and [[Role Data Manager|Data Managers]]. Such activity is done on a regular basis using the different tools provided to monitor the status of gCube , gLite | + | The monitoring of the D4Science infrastructure is carried out by [[Role Infrastructure Manager|Infrastructure Managers]], [[Role Site Manager|Site Managers]], [[Role VRE Manager|VRE Managers]], [[Role VO Admin|VO Admins]], and [[Role Data Manager|Data Managers]]. Such activity is done on a regular basis using the different tools provided to monitor the status of gCube , gLite, Hadoop and Other Resoruces (check below). |
In case a new problem is identified an incident should be reported immediately following the [[Procedure Infrastructure Incident Management|Incident Management]] procedure. | In case a new problem is identified an incident should be reported immediately following the [[Procedure Infrastructure Incident Management|Incident Management]] procedure. | ||
− | == gCube | + | == gCube Resources == |
+ | |||
+ | The monitoring of the gCube Resources of the infrastructure is based on three systems: | ||
− | |||
* IS Monitoring: Based on information published in the gCube Information System. This information is accessible from: | * IS Monitoring: Based on information published in the gCube Information System. This information is accessible from: | ||
** [http://monitor.d4science.research-infrastructures.eu/iv Infrastructure Viewer] | ** [http://monitor.d4science.research-infrastructures.eu/iv Infrastructure Viewer] | ||
Line 15: | Line 16: | ||
** [https://pcd4science3.cern.ch/nagios3/ Nagios Server] | ** [https://pcd4science3.cern.ch/nagios3/ Nagios Server] | ||
− | == gLite | + | == gLite Resources == |
− | There are several tools to monitor the EGI production infrastructure | + | There are several tools to monitor the EGI production infrastructure resources and consequently the gLite resources. Many of these tools share the same information source providing only different views over it. Such large number of tools cover many monitoring possibilities. |
* Services Availability: [http://gridview.cern.ch/GRIDVIEW/dt_index.php GridView], [https://grid-monitoring.egi.eu/myegi MyEGI] | * Services Availability: [http://gridview.cern.ch/GRIDVIEW/dt_index.php GridView], [https://grid-monitoring.egi.eu/myegi MyEGI] | ||
Line 34: | Line 35: | ||
|} | |} | ||
− | == Hadoop | + | == Hadoop Resources == |
The Hadoop clusters are monitored trough the Hadoop internal monitoring and tracking systems. These tools provide monitoring for [http://hadoop.apache.org/mapreduce/ MapReduce] jobs and for [http://hadoop.apache.org/hdfs/ HDFS] filesystems. | The Hadoop clusters are monitored trough the Hadoop internal monitoring and tracking systems. These tools provide monitoring for [http://hadoop.apache.org/mapreduce/ MapReduce] jobs and for [http://hadoop.apache.org/hdfs/ HDFS] filesystems. | ||
Line 44: | Line 45: | ||
| bgcolor="lightgrey" align="center"|CNR || <center>http://node1.hadoop.research-infrastructures.eu:50030/jobtracker.jsp</center> || <center>http://node1.hadoop.research-infrastructures.eu:50070/dfshealth.jsp</center> | | bgcolor="lightgrey" align="center"|CNR || <center>http://node1.hadoop.research-infrastructures.eu:50030/jobtracker.jsp</center> || <center>http://node1.hadoop.research-infrastructures.eu:50070/dfshealth.jsp</center> | ||
|} | |} | ||
+ | |||
+ | == Runtime Resources == | ||
+ | |||
+ | The monitoring of the Runtime Resources of the infrastructure is based on 2 systems: | ||
+ | |||
+ | * IS Monitoring: Based on information published in the gCube Information System. This information is accessible from: | ||
+ | ** [http://monitor.d4science.research-infrastructures.eu/iv Infrastructure Viewer] | ||
+ | ** [http://monitor.d4science.research-infrastructures.eu/rm Advanced Monitoring] | ||
+ | * Nagios: Based on the information gathered by [http://www.nagios.org/ Nagios] about the availability of each Runtime Resource. For some of the Runtime Resources additional checks are going to be instrumunted ( e.g. Mysql or PSQL DB DB sizes, or Index Usages) in Nagios. | ||
+ | In case of problems Nagios notifies by mail the [[Role Infrastructure Manager|Infrastructure Managers]]. The D4Science Ecosystem Nagios server is available at ( rescricted access to Infrastructure Managers): | ||
+ | ** [https://pcd4science3.cern.ch/nagios3/ Nagios Server] |
Revision as of 12:19, 8 December 2011
The monitoring of the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. infrastructure is carried out by Infrastructure Managers, Site Managers, VRE Managers, VO Admins, and Data Managers. Such activity is done on a regular basis using the different tools provided to monitor the status of gCube , gLite, Hadoop and Other Resoruces (check below).
In case a new problem is identified an incident should be reported immediately following the Incident Management procedure.
gCube Resources
The monitoring of the gCube Resources of the infrastructure is based on three systems:
- IS Monitoring: Based on information published in the gCube Information System. This information is accessible from:
- Messaging System: Based on the information published by probes local to each node. This information is used to send emails to Site Managers when problems are found.
- Nagios: Based on the information gathered by Nagios about the availability of each gHN. In case of problems Nagios notifies by mail the Infrastructure Managers. The D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Ecosystem Nagios server is available at ( rescricted access to Infrastructure Managers):
gLite Resources
There are several tools to monitor the EGI production infrastructure resources and consequently the gLite resources. Many of these tools share the same information source providing only different views over it. Such large number of tools cover many monitoring possibilities.
The table below provides direct links to status of the different gLite sites of the infrastructure provided by iMarine members:
Site | Service Availability | |
---|---|---|
CNR | gocdb gstat | MyEGI Service Availability |
NKUA | gocdb gstat | MyEGI Service Availability |
Hadoop Resources
The Hadoop clusters are monitored trough the Hadoop internal monitoring and tracking systems. These tools provide monitoring for MapReduce jobs and for HDFS filesystems.
MapReduce | HDFS | |
---|---|---|
CNR | |
|
Runtime Resources
The monitoring of the Runtime Resources of the infrastructure is based on 2 systems:
- IS Monitoring: Based on information published in the gCube Information System. This information is accessible from:
- Nagios: Based on the information gathered by Nagios about the availability of each Runtime Resource. For some of the Runtime Resources additional checks are going to be instrumunted ( e.g. Mysql or PSQL DB DB sizes, or Index Usages) in Nagios.
In case of problems Nagios notifies by mail the Infrastructure Managers. The D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Ecosystem Nagios server is available at ( rescricted access to Infrastructure Managers):