Difference between revisions of "Procedure Infrastructure Monitoring"
Andrea.manzi (Talk | contribs) (Created page with "__NOTOC__ The monitoring of the D4Science infrastructure is carried out by Infrastructure Managers, Site Managers, [[Role VR...") |
|||
(31 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | + | __TOC__ | |
− | The monitoring of the D4Science infrastructure is carried out by [[Role Infrastructure Manager|Infrastructure Managers]], [[Role Site Manager|Site Managers]], [[Role VRE Manager|VRE Managers]], [[Role VO Admin|VO Admins]], and [[Role Data Manager|Data Managers]]. Such activity is done on a regular basis using the different tools provided to monitor the status of | + | |
+ | == D4Science Infrastructure Monitoring == | ||
+ | |||
+ | The monitoring of the D4Science infrastructure is carried out by [[Role Infrastructure Manager|Infrastructure Managers]], [[Role Site Manager|Site Managers]], [[Role VRE Manager|VRE Managers]], [[Role VO Admin|VO Admins]], and [[Role Data Manager|Data Managers]]. Such activity is done on a regular basis using the different tools provided to monitor the status of the Resources (check below). | ||
In case a new problem is identified an incident should be reported immediately following the [[Procedure Infrastructure Incident Management|Incident Management]] procedure. | In case a new problem is identified an incident should be reported immediately following the [[Procedure Infrastructure Incident Management|Incident Management]] procedure. | ||
− | == | + | == D4Science Resources == |
+ | |||
+ | The monitoring of the D4Science Resources of the infrastructure is based on several systems: | ||
− | |||
* IS Monitoring: Based on information published in the gCube Information System. This information is accessible from: | * IS Monitoring: Based on information published in the gCube Information System. This information is accessible from: | ||
− | ** [ | + | ** [https://services.d4science.org/group/d4science.research-infrastructures.eu/accounting Advanced Monitoring] |
− | + | ||
* Messaging System: Based on the information published by probes local to each node. This information is used to send emails to [[Role Site Manager|Site Managers]] when problems are found. | * Messaging System: Based on the information published by probes local to each node. This information is used to send emails to [[Role Site Manager|Site Managers]] when problems are found. | ||
− | * Nagios: Based on the information gathered by [http://www.nagios.org/ Nagios] about the availability of each gHN. In case of problems Nagios notifies by mail the [[Role Infrastructure Manager|Infrastructure Managers]]. | + | * Nagios: Based on the information gathered by [http://www.nagios.org/ Nagios] about the availability of each gHN. In case of problems Nagios notifies by mail the [[Role Infrastructure Manager|Infrastructure Managers]]. The D4Science Nagios server is available at [http://nagios.d4science.org/nagios3/ Nagios Server] |
− | == | + | == Hadoop Resources == |
− | + | The Hadoop clusters are monitored through the Hadoop internal monitoring and tracking systems. These tools provide monitoring for [http://hadoop.apache.org/mapreduce/ MapReduce] jobs and for [http://hadoop.apache.org/hdfs/ HDFS] filesystems. | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
{| border="1" cellpadding="4" cellspacing="0" | {| border="1" cellpadding="4" cellspacing="0" | ||
|- | |- | ||
− | ! width="60"| !! width=" | + | ! width="60"| !! width="150"|MapReduce !! width="150"|HDFS |
|- | |- | ||
− | | bgcolor="lightgrey" align="center"|CNR || | + | | bgcolor="lightgrey" align="center"|CNR || <center>[http://jobtracker.t.hadoop.research-infrastructures.eu:50030/jobtracker.jsp Job Tracker]</center> || <center>[http://quorum1.t.hadoop.research-infrastructures.eu:50070/dfshealth.jsp DFS Health]</center> |
− | | | + | |
− | | | + | |
|} | |} | ||
− | == | + | == Runtime Resources == |
− | The | + | The monitoring of the Runtime Resources of the infrastructure is based on 2 systems: |
− | + | * IS Monitoring: Based on information published in the gCube Information System. This information is accessible from: | |
− | + | ** [https://services.d4science.org/group/d4science.research-infrastructures.eu/accounting Advanced Monitoring] | |
− | + | * Nagios: Based on the information gathered by [http://www.nagios.org/ Nagios] about the availability of each Runtime Resource. For some of the Runtime Resources additional checks are going to be instrumented ( e.g. Mysql or PSQL DB DB sizes, or Index Usages) in Nagios.In case of problems Nagios notifies by mail the [[Role Infrastructure Manager|Infrastructure Managers]]. The iMarine Data e-Infrastructure Nagios server is available at [http://nagios.d4science.org/nagios3/ Nagios Server] | |
− | + | ||
− | + | ||
− | + |
Latest revision as of 18:24, 27 March 2018
D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Infrastructure Monitoring
The monitoring of the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. infrastructure is carried out by Infrastructure Managers, Site Managers, VRE Managers, VO Admins, and Data Managers. Such activity is done on a regular basis using the different tools provided to monitor the status of the Resources (check below).
In case a new problem is identified an incident should be reported immediately following the Incident Management procedure.
D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Resources
The monitoring of the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Resources of the infrastructure is based on several systems:
- IS Monitoring: Based on information published in the gCube Information System. This information is accessible from:
- Messaging System: Based on the information published by probes local to each node. This information is used to send emails to Site Managers when problems are found.
- Nagios: Based on the information gathered by Nagios about the availability of each gHN. In case of problems Nagios notifies by mail the Infrastructure Managers. The D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. Nagios server is available at Nagios Server
Hadoop Resources
The Hadoop clusters are monitored through the Hadoop internal monitoring and tracking systems. These tools provide monitoring for MapReduce jobs and for HDFS filesystems.
MapReduce | HDFS | |
---|---|---|
CNR | |
|
Runtime Resources
The monitoring of the Runtime Resources of the infrastructure is based on 2 systems:
- IS Monitoring: Based on information published in the gCube Information System. This information is accessible from:
- Nagios: Based on the information gathered by Nagios about the availability of each Runtime Resource. For some of the Runtime Resources additional checks are going to be instrumented ( e.g. Mysql or PSQL DB DB sizes, or Index Usages) in Nagios.In case of problems Nagios notifies by mail the Infrastructure Managers. The iMarine Data e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. Nagios server is available at Nagios Server