[Zoobank-list] Re: Zoobank-list Digest, Vol 12, Issue 4

Richard Pyle deepreef at bishopmuseum.org
Thu Mar 1 21:18:19 GMT 2007


Dear Wolfgang,

> I didn't surf through all of TDWG's pages, but maybe my impression isn't
totally wrong: 
> seems a majority of people involved in discussion & development of
standards are 
> biodiversity informatics specialists and quite a few botanists,
mycologists, etc. but
> ... relatively few zoologists (esp. when compared to the quantity of
animal taxa)?

I believe this impression is accurate.  I've been to many meetings relating
to taxonomy informatics, and when it comes time to discuss nomenclatural
issues, as a Zoologist I often feel in the minority. The botany community
was the first to dabble (unsuccessfully) in the idea of Code-mandated
registration, and they are the ones pushing forward with LSIDs and other
relevant tachnologies already.  And, they have a number of key individuals
with the inspiration and resources to push the data initiatives forward.
The batceriological community is farthest ahead (with registration already a
requirement), but they are probably least reprersented in these sorts of
discussions (a fact that most definitely should be remedied!!) There are
several of us who represent the perspective of zoologists in this arena, but
a larger and more cohesive voice from our community is needed.

There is going to be an issue regarding how GUIDs (e.g., LSIDs) are assigned
to taxon names to botanical vs. zoological names.  This comes down to the
fundamental difference in how zoologists and botanists think of a "name" (or
as we informatics nerds would say, a "name object" -- the thing to which a
GUID is attached and/or represents).  Consider these hypothetical names:

Aus L.
Xus Jones
Aus bus Smith
Xus bus (Smith)

The first clue to the differences between zoological and botanical practice
is that the last of these would be represented as "Xus bus (Smith) Jones",
where Jones is credited as the first to have placed the species epithet
"bus" within the genus "Xus".

In our (zoological) realm, we would certainly think of the "original genus"
as an attribute of a species epithet (at the very least so that we know
whether to put parentheses around the author), but otherwise we don't track
combinations under ICZN (rules governing gender matching notwithstanding).
To a zoologist, the combination is an attribute of the particular usage
instance.  For example, there may by five publications citing the species
epithet "bus" and placing it in the genus "Aus" (one of these being the
original description), and there may be five others placing "bus" within the
genus "Xus". While it may very well be that Jones was the first to create
this "Xus bus" combination, we in Zoology do not ascribe any special status
to that event -- that is, we do not regard it as a Code-governed
nomenclatural act.

Thus, from the Zoological perspective, there are three GUIDs (LSIDs) needed
to accommodate the four items above:

LSID1: Aus L.
LSID2: Xus Jones
LSID3: bus Smith [original genus: LSID1]

We would then keep track of combinations through name-usage instances.  For
example, we might have ten records in out "usage instances" database to
represent the five published citations of "Aus bus" and the five published
citations of "Xus bus".  These would all be thought of as usage intances of
"bus" (LSID3), and an attribute each usage instance would be which genus
combination it was placed with (five would point to LSID1 as the parent
genus, and the other five would point to LSID2 as the parent genus). E.g.:

Usage#	NameID	ParentID
--------------------------------
  1		LSID3		LSID1
  2		LSID3		LSID1
  3		LSID3		LSID1
  4		LSID3		LSID1
  5		LSID3		LSID1
  6		LSID3		LSID2
  7		LSID3		LSID2
  8		LSID3		LSID2
  9		LSID3		LSID2
  10		LSID3		LSID2


>From the botanical perspective, however, each combination is treated as a
distinct (code-governed) "name" (Name-object).  Thus, for botanists, there
would be four GUIDs, instead of three:

LSID1: Aus L.
LSID2: Xus Jones
LSID3: Aus bus Smith
LSID4: Xus bus (Smith) Jones [basionym: LSID3]

For usage instance records, instead of having pointers to two GUIDs per
record (one for the species epithet, one for the genus) there would simply
be a pointer to the combination as used. E.g.:

Usage#	NameID
-------------------
  1		LSID3	
  2		LSID3	
  3		LSID3	
  4		LSID3	
  5		LSID3	
  6		LSID4	
  7		LSID4	
  8		LSID4	
  9		LSID4	
  10		LSID4	

I'll make two observations about this fundamental difference between the
botanical and zoological approaches to "names" (name-objects):

1) It may be that this difference ends up being transparent, once we get
this stuff implemented.  In other words, there may be no problem with
ZooBank assigning GUIDs by the zoological tradition, and IPNI/IF assigning
GUIDs by the botanical tradition -- as long as the informatics architecture,
standards, and protocols are done right, there should be little difficulty
aggregating botanical and zoological names data together.  On the other
hand, I can't help but think it will ultimately be to everyone's advantage
to all be on the "same page" in terms of GUID issuance, so that there is no
question how a GUID under one code corresponds to a GUID under another (in
terms of what you do with the metadata attached to that GUID).

2) At first glance, the botanical approach might seem preferable because it
leads to a (seemingly) simpler way of tracking the relationship between
"names" and other data (like usages, specimens, etc.) However, I think there
is compelling reason from an information management perspective (outside of
personal biases of different taxonomic traditions) to treat "name objects"
as monomial units (ala zoological tradition), and then layer everything else
on top of usage instances (without the need to GUID-ify name-combination
units in-between these two layers).  But I'll save the details of this
perspective for another discussion.

	
> Maybe it would be helpful to have another little "wiki" 
> explaining the role and benefits of LSIDs within the "realm" 
> of the ZooCode (ZooBank).

I don't think we need a full Wiki for that.  Aside from everything I wrote
above about differences in implementation, I think the rationale for LSIDs
in the realm of nomenclatural data are common to organisms that fall under
all of the different codes.  Perhaps a new page or set of pages on the
existing TDWG Wiki devoted to explaining the role and benefits of LSIDs for
nomenclature would be helpful (Lee??). I agree there needs to be more
explanatory documentation -- and no doubt there will be.  But right now, the
prototypers are feverishly trying to get these ideas implemented.  There is
a target deadline of the next TDWG meeting to have some examples
operational, and as I have said in the past, I think the most productive
conversation will happen after we have an operating web site (or sites) to
illustrate the potential of what LSIDs can do, and why they can make it much
easier for taxonomists (who may have no idea what a GUID is) to get their
jobs done more efficiently.
 
> As you know, the preamble of the Code itself says: "The objects
> of the Code are to promote stability and universality in the
> scientific names of animals and to ensure that the name of each
> taxon is unique and distinct."
> What exactly are the limitations of our "unique" Code-compliant 
> names in eScience that make LSIDs neccessary? 

Homonyms (both within and between different Code domains) and misspellings
are the most obvious examples.  More fundamentally, scientific names are
wonderful labels for humans to interpret.  But they are terrible identifiers
for computers (for various reasons).  The role of computers and the internet
in the future of taxonomy ultimately boils down to a streamlinging of
information access.  I no longer need to go to a library's rare book
collection to track down a copy of an original description, I just click a
mouse and it's in front of me.  The same piece of taxonomic information no
longer needs to be discovered multiple times by multiple researchers -- once
someone discovers it, everyone has access to it.  I no longer have to canvas
myriad books and journals for new names and nomenclatural acts of relevance
to my group of interest, they are delivered to me every day/week/month via
an email digest.  I no longer have to wait a year for someone to describe an
publish a new species so I can see a high-res color photograph of it -- I
now get it delivered to my email inbox within minutes of the photo being
taken (which may be within minutes of the organism being collected from the
field).

I think the value of streamlined information access is evident.  It already
exists, to some extent, for a few groups and for a few areas of information
content.  But it's not cohesive.  In order to make the global biodiversity
information cohesive, we need an architecture and set of standards and
protocols (i.e., an infrastructure) to organize it.  And at the very core of
this infrastructure is GUIDs -- which in our case most likely means LSIDs.

> And what, on the other hand, can be done better with our traditional
Code-compliant naming scheme? 
> Maybe the usefulness of Code-compliant names could be enhanced even for
certain (not all)  
> e-functions, by just a few extensions, e.g. "ZF:Cicindelina" =a
family-group 
> taxon in Zoology;  "ZG:Cicindelina" =a genus in Zoology. 

Solutions like the "ZG" prefix you suggest are potentially useful tools to a
pair of human eyes connected to a human brain.  But a key aspect of
GUIDs/LSIDs is that they are *NOT* intended for direct human consumption.
If the infrastructure is built well, the average user/taxonomist will never
even see them.  Some have advocated printing GUIDs (e.g., ZooBank Registry
identifiers) in publications in a form that humans could read and type into
their web browsers.  While this may inevitably be the case during the
transition/interim phase; I don't see that as part of a long-term solution.
In a way, GUIDs are like the ASCII character set on a computer.  Almost
nobody knows as they pound away on their computer keyboards that everytime
they strike the letter "Q", the electronic equivalent of "01010001" travels
through the wires of their PC.  All the user cares about is that the letter
"Q" shows up on the computer screen.  In the same way, when a user clicks on
a link from a species name displayed on a web page, they don't need to know
that an underlying GUID is sent out through the internet.  All the user
cares about is that the taxonomic details for that species pop up on the
computer screen after the link is clicked.
	
> I'm also trying to figure out how some ICZN specialties would match 
> with a naming scheme supported by LSIDs. For example, how to deal with
> the following case:
> An author describes a new species in his preferred "splitter" 
> genus-concept. The binomen is entered in ZooBank by that generic 
> combination and gets an LSID, but subsequent authors may use the name 
> in combination with a wider genus concept, - no registration needed 
> for such a change in generic combination. Any ideas how such 
> situations, pretty common in zoological names, can be dealt with? 

I think I covered a lot of this (before I even got this far into reading
your email) above, in comaring botany to zoology traditions.  You've hit on
one of the key aspects of ZooBank that need to be discussed soon, so rather
than belabor this already-too-long email further, I'll start a new thread to
address this exact issue. Stand by....it will take me some time to draft.

Aloha,
Rich

Richard L. Pyle, PhD
Database Coordinator for Natural Sciences
  and Associate Zoologist in Ichthyology
Department of Natural Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef at bishopmuseum.org
http://hbs.bishopmuseum.org/staff/pylerichard.html




-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the Zoobank-list mailing list