Difference between revisions of "Taxa Merging Discussion"

From D4Science Wiki
Jump to: navigation, search
Line 16: Line 16:
 
All names of rank genus and above are ‘uninomens’ – they consist of one word. Anything below genus has at least two words. There are standard suffixes to the names for ranks above genus; these suffixes are different for the different kingdoms; a family in zoology always ends in ‘-idea’, in botany with ‘-aceae’. A complete list is in table 1. Note also that the column for animals is largely blank: the Zoological code only prescribes rules for naming taxa of rank superfamily down to (and including) subspecies; anything above the family-group is not regulated by the code.
 
All names of rank genus and above are ‘uninomens’ – they consist of one word. Anything below genus has at least two words. There are standard suffixes to the names for ranks above genus; these suffixes are different for the different kingdoms; a family in zoology always ends in ‘-idea’, in botany with ‘-aceae’. A complete list is in table 1. Note also that the column for animals is largely blank: the Zoological code only prescribes rules for naming taxa of rank superfamily down to (and including) subspecies; anything above the family-group is not regulated by the code.
 
Genus names and everything above has an initial capital. A species name consists of two parts, the genus name and the ‘specific epitheton’ or epithet; the latter does not start with a capital. And usually everything from genus on downwards is written in italics. So part of a classification could be  
 
Genus names and everything above has an initial capital. A species name consists of two parts, the genus name and the ‘specific epitheton’ or epithet; the latter does not start with a capital. And usually everything from genus on downwards is written in italics. So part of a classification could be  
Family Semelidae
+
Family Semelidae
Genus ''Abra''
+
Genus ''Abra''
Species ''Abra alba''
+
Species ''Abra alba''
  
 
The names of rank species and below are often followed by the name of the person who originally described the species, and the year the description first became publicly available; sometimes this is extended to genus (which is good practice), rarely for family and above. So the classification above would be
 
The names of rank species and below are often followed by the name of the person who originally described the species, and the year the description first became publicly available; sometimes this is extended to genus (which is good practice), rarely for family and above. So the classification above would be
Family Semelidae
+
Family Semelidae
Genus ''Abra'' Leach in Lamarck, 1818
+
Genus ''Abra'' Leach in Lamarck, 1818
Species ''Abra alba'' (W. Wood, 1802)
+
Species ''Abra alba'' (W. Wood, 1802)
  
 
The brackets around the author of the species are not for decoration, but carry meaning; we’ll come back to that. As shown in the author string for the genus, there can be complications; in this case, the author of the publication was Lamarck, but the person who actually wrote the description was Leach. Technically, only Lamarck’s name should be there; but in order to recognize the intellectual effort, Leach is also listed.
 
The brackets around the author of the species are not for decoration, but carry meaning; we’ll come back to that. As shown in the author string for the genus, there can be complications; in this case, the author of the publication was Lamarck, but the person who actually wrote the description was Leach. Technically, only Lamarck’s name should be there; but in order to recognize the intellectual effort, Leach is also listed.
 
Infraspecific names have more than two parts. So a subspecies in Zoology would be written as  
 
Infraspecific names have more than two parts. So a subspecies in Zoology would be written as  
''Uca lactea annulipes'' (H. Milne-Edwards, 1837)
+
''Uca lactea annulipes'' (H. Milne-Edwards, 1837)
  
 
In Zoology, names of ranks below subspecies are not regulated, though they are often used. But because of this, many would assume that a trinomen, as illustrated above, is always a subspecies, never variety or form. For varieties or forms, the rank is indicated by writing , ‘var.’ or ‘f.’ in front of the relevant name part:
 
In Zoology, names of ranks below subspecies are not regulated, though they are often used. But because of this, many would assume that a trinomen, as illustrated above, is always a subspecies, never variety or form. For varieties or forms, the rank is indicated by writing , ‘var.’ or ‘f.’ in front of the relevant name part:
''Balanus amaryllis'' f. nivea Gruvel
+
''Balanus amaryllis'' f. nivea Gruvel
  
 
In Botany, infraspecific ranks are covered by the Code; a subspecies is normally indicated by writing ‘ssp’ in front of the relevant name part. OBIS and WoRMS do not follow this convention (I should mend my ways).
 
In Botany, infraspecific ranks are covered by the Code; a subspecies is normally indicated by writing ‘ssp’ in front of the relevant name part. OBIS and WoRMS do not follow this convention (I should mend my ways).
  
 
Subgenera are written with a capital, and between brackets:
 
Subgenera are written with a capital, and between brackets:
''Uca (Paraleptuca)'' Bott, 1973
+
''Uca (Paraleptuca)'' Bott, 1973
''Uca (Paraleptuca) lactea'' (De Haan, 1835)
+
''Uca (Paraleptuca) lactea'' (De Haan, 1835)
''Uca (Paraleptuca) lactea annulipes'' (H. Milne-Edwards, 1837)
+
''Uca (Paraleptuca) lactea annulipes'' (H. Milne-Edwards, 1837)
  
 
Note that the last name refers to the same taxon as ''Uca lactea annulipes'' (H. Milne-Edwards, 1837), just as ''Uca (Paraleptuca) lactea'' (De Haan, 1835) and ''Uca lactea'' (De Haan, 1835) are two alternative strings referring to the same species.
 
Note that the last name refers to the same taxon as ''Uca lactea annulipes'' (H. Milne-Edwards, 1837), just as ''Uca (Paraleptuca) lactea'' (De Haan, 1835) and ''Uca lactea'' (De Haan, 1835) are two alternative strings referring to the same species.

Revision as of 15:00, 4 April 2012

Introduction

The main goal of this topic is to identify, in a formal way, some critical characteristics for describing species coming from different data sources, in order to understand when two entries refer to the same one. Such identification task is not trivial because of the deep differences in the nomenclature protocols which are followed in different areas of biology. Nomenclature can vary moving from Zoology to Botany and Bacteriology.

This page has the daring scope of investigating the margins for building a merging algorithm which solves the above issue, as an automatic solution has never been found up to now.

Biological Nomenclatures

Nomenclature is the branch of biological sciences that deals with naming of species and higher-order groupings; creating these higher-order groupings is the realm of Classification. What follows is a synopsis of some of the more important points of Nomenclature and Classification, especially those aspects that might be of interest to a data manager/IT person.

Apart from Nomenclature and Classification, two more branches of biological sciences are often referred to in the same context: Systematics and Taxonomy. The boundaries between these are not always clear, and many authors give their own definitions, often contradictory with what has been written earlier. On the Wikipedia page on systematics (http://en.wikipedia.org/wiki/Systematics), for example, there are two different views in the first three paragraphs. In general, Systematics is often seen as the more-encompassing field (including phylogeny and biogeography), with Taxonomy as one of its branches; and Nomenclature and Classification part of Taxonomy (the Wikipedia page on Systematics has it different in one of the two views expressed there; he Wikipedia entry on taxonomy, http://en.wikipedia.org/wiki/Taxonomy, has classification as part of Taxonomy). Luckily these discussions are not really relevant to what we’re trying to accomplish here. Nomenclature is bound to the rules that are defined in one of the three ‘Codes’ – the ‘rule books’ of nomenclature, which are different for Zoology, Botany and Bacteriology; the latter used to be included in the Botanical code, but is separate since 1975.

All groupings of organisms are referred to as ‘Taxa’, singular ‘Taxon’. A taxon has a name (as defined through Nomenclature), and a ‘Rank’ and a position in the classification (as defined through the science of Classification). Standard ranks are Regnum (or kingdom), Phylum, Classis (class), Ordo (order), Familia (family), genus and species. In Botany, the phylum is more often than not referred to as Divisio (division). The scientific name of the rank is not often used; personally I use them sometimes for field names, as some of the vernacular rank names are reserved words in SQL.

Very often, extra ranks are defined by prepending a qualifier before the rank name: super-, sub- and infra-; superfamily is larger than a family is larger than a subfamily is larger than an infrafamily. The rank ‘Tribus’ (tribe) is sometimes used, comes between family and genus, and can also be qualified with super-, sub- and infra-. Some exceptions with the lower ranks: supergenus, infragenus, and superspecies and infraspecies are not used. There is a group of ranks below subspecies, together with subspecies itself collectively referred to as ‘infraspecific ranks’: variety and form; both are used in combination with the sub- prefix.

All names of rank genus and above are ‘uninomens’ – they consist of one word. Anything below genus has at least two words. There are standard suffixes to the names for ranks above genus; these suffixes are different for the different kingdoms; a family in zoology always ends in ‘-idea’, in botany with ‘-aceae’. A complete list is in table 1. Note also that the column for animals is largely blank: the Zoological code only prescribes rules for naming taxa of rank superfamily down to (and including) subspecies; anything above the family-group is not regulated by the code. Genus names and everything above has an initial capital. A species name consists of two parts, the genus name and the ‘specific epitheton’ or epithet; the latter does not start with a capital. And usually everything from genus on downwards is written in italics. So part of a classification could be

Family Semelidae
Genus Abra
Species Abra alba

The names of rank species and below are often followed by the name of the person who originally described the species, and the year the description first became publicly available; sometimes this is extended to genus (which is good practice), rarely for family and above. So the classification above would be

Family Semelidae
Genus Abra Leach in Lamarck, 1818
Species Abra alba (W. Wood, 1802)

The brackets around the author of the species are not for decoration, but carry meaning; we’ll come back to that. As shown in the author string for the genus, there can be complications; in this case, the author of the publication was Lamarck, but the person who actually wrote the description was Leach. Technically, only Lamarck’s name should be there; but in order to recognize the intellectual effort, Leach is also listed. Infraspecific names have more than two parts. So a subspecies in Zoology would be written as

Uca lactea annulipes (H. Milne-Edwards, 1837)

In Zoology, names of ranks below subspecies are not regulated, though they are often used. But because of this, many would assume that a trinomen, as illustrated above, is always a subspecies, never variety or form. For varieties or forms, the rank is indicated by writing , ‘var.’ or ‘f.’ in front of the relevant name part:

Balanus amaryllis f. nivea Gruvel

In Botany, infraspecific ranks are covered by the Code; a subspecies is normally indicated by writing ‘ssp’ in front of the relevant name part. OBIS and WoRMS do not follow this convention (I should mend my ways).

Subgenera are written with a capital, and between brackets:

Uca (Paraleptuca) Bott, 1973
Uca (Paraleptuca) lactea (De Haan, 1835)
Uca (Paraleptuca) lactea annulipes (H. Milne-Edwards, 1837)

Note that the last name refers to the same taxon as Uca lactea annulipes (H. Milne-Edwards, 1837), just as Uca (Paraleptuca) lactea (De Haan, 1835) and Uca lactea (De Haan, 1835) are two alternative strings referring to the same species.