[Zoobank-list] ZooBank Data Objects
Richard Pyle
deepreef at bishopmuseum.org
Fri Mar 2 01:15:23 GMT 2007
Wolfgang Lorenz wrote in a recent post to this list:
> An author describes a new species in his preferred "splitter"
> genus-concept. The binomen is entered in ZooBank by that generic
> combination and gets an LSID, but subsequent authors may use the name
> in combination with a wider genus concept, - no registration needed
> for such a change in generic combination. Any ideas how such
> situations, pretty common in zoological names, can be dealt with?
This touches on a very fundamental issue regarding ZooBank, particularly
with regard to its scope and how it should be implemented. For the moment,
I think we should put aside the whole "botany/zoology" discrepancy issue and
concentrate on how we would define a system optimized for Zoology.
In order for GUIDs/LSIDs to be meaningful, we have to have a clear
understanding of what they refer to. For example, a Social Security Number
in the US refers to a single person. It does not refer to a whole family,
or an organization, or to just one part of a person's life. There is an
unambiguous understanding of what the number refers to.
Perhaps the most fundamental aspect of ZooBank that must be resolved is
"what does a registration entry represent"? That is, what "thing" does the
GUID/LSID get assigned too?
Nominally, the "things" include "names and nomenclatural acts".
Let's start with just the "names" part. I'm going to assume that most
zoologists will concur with the assessment in my earlier email of what the
zoological "name objects" are, e.g.:
LSID1: Aus L.
LSID2: Xus Jones
LSID3: bus Smith
LSID4: Aiidae L.
For many reasons (not just relating to my zoological bias), I think it would
be a mistake for ZooBank to issue "name" GUIDs to each combination
separately, the way botanists would. This is not to say that we don't want
to track alternate combinations -- I just think we should do so via "Usage
Instance" objects, rather than "Name" objects. I will be happy to elaborate
on this if someone would like me to.
So, for the sake of argument, let's suppose we have this zoology-style
"name" object. Next we have to decide the scope of names to include.
Available family-group names, genus-group names, and species-group epithets
-- certainly yes. Vernacular names -- certainly not. Less certain is a
whole spectrum of other types of names, ranging from the "probably"
(objective synonyms and other unavailable names) to the "maybe" (all higher
rank-group names up to Kingdom) to the "probably not" (morphospecies and
other semi-scientific names combining, e.g., a genus name with a
non-scientific epithet such as "sp. 32").
Once we agree on that, we need to think about the attributes of a name
object. Some obvious ones include "Rank", "original spelling", "Original
Genus", "Author", "Date of publication", "Type genus/species/specimen(s)",
publication details, and so on. However, on closer inspection, some of
these attributes do not really describe the "name" object directly. For
example publication details and date of publication are more correctly
attributes of a publication object, that might be linked to a name object.
A good case could be made for making "Author" an attribute of the
publciation as well. That way, the name object would have attributes like
"Rank", "Original Spelling", "Type Specimen", and so on, and also an
attribute of "OriginalPublicationID" that would point to a Publication
object (from which the name object would derrive the author, date, citation
details, etc.)
If you follow this line of thinking, you may also find value for ZooBank in
recording certain "Usage Instances" These are cases where a Name is cited
in a publication. At it's most fundamental core, a Usage Instance is a
combination of a NameID and a PublicationID, which effiectively means "this
name appeared in this publication". Such usage instances have attrbutes of
their own -- like page number (i.e., the page on which this name appeared in
this publication).
Now is not the time, but at some point soon I hope to present a draft model
for what the various objects might be, which ones would have GUIDs managed
by ZooBank (vs. GUIDs managed by GBIF or EoL or some other initiative), and
so on. Only with this level of detail articulated is it possible to
directly address Wolfgang's question: "Any ideas how such situations, pretty
common in zoological names, can be dealt with?"
My own answer is "yes" -- I most certainly do have ideas on how to deal with
this situation. In fact, they are ideas that have evolved out of nearly 16
years of managing taxonomic names data in computer databases, and extensive
participations in various biodiversity informatics initiatives. There's no
way I can articulate these ideas in any meaningful detail here. But I will
summarize as follows:
The core unit of information that unites all of the things we argue about --
taxonomic names, taxonomic concepts, subjective and objective synonyms, all
nomenclatural acts (including typifications, emendations, ICZN Case rulings,
etc.), specimen identifications; and even bridges the gap completely between
zoology, botany, bacteriology, indeed all realms of biology from the
scientific (ecology) to the bird-watcher -- is the Usage Instance object.
This is not complicated. A "Usage Instance" is simply the appearance of a
name within some sort of documentable form (not even restricted to
publications. Imagine an index of Name-Usage instances, each one with a
unique GUID. From among this potentially limitless pool of GUIDs, a certain
subset represent instances of ICZN-Code-relevant acts, such as the
descriptions of new names and other Code-governed events (all of which occur
within the context of some documentable usage instance, such as publications
-- and hence would have a Usage Instance GUID). Another subset would
include analagous names and acts governed under the ICBN Code. Yet another
subset (likely overlapping substantially with, but extending well beyond,
the previous two) would represent cases where full-blown taxon concepts are
defined. Different communities with different data needs could define the
scope of their subsets of interest any way they wish, without any impact on
how other communities defined subsets. Usage Instances are the least common
denominator -- the global currency -- the grand unified unit of biodiversity
informatics (at least as far as taxonomy and nomenclature is concerned).
I certainly don't pretend to have all the answers. My main point here is
that if we, as a community developing ZooBank, build it around a fundamental
core of usage instances (rather than "name objects" per se), then I think we
will be heading down the right (=long-term success) path.
So....at long last, my answer to Wolfgang is that we need to clarify what
the LSID is assigned to, in the premise to his question above. As
articulated, it could be to a monomial name, a particular name combination
(i.e., the original name combination) and/or subsequent combinations, or
even a taxonomic concept.
Aloha,
Rich
Richard L. Pyle, PhD
Database Coordinator for Natural Sciences
and Associate Zoologist in Ichthyology
Department of Natural Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef at bishopmuseum.org
http://hbs.bishopmuseum.org/staff/pylerichard.html
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the Zoobank-list
mailing list