[Zoobank-list] Protonym, name usage, datanym

Richard Pyle deepreef at bishopmuseum.org
Mon Mar 3 08:10:09 GMT 2008


Thanks, David.  I'm glad to see the conversation started again!

A couple of comments on your post:

> Richard Pyle's interpretation of protonyms and name usages is quite
appealing, 
> but I run into trouble with misapplied names.  

Can you give an example of a misapplied name?  Just to make sure we are
defining it in the same way.

> My tentative solution is to create a new concept, the "datanym."  For any
who are interested, 
> the idea is set out in greater detail at
http://software.speciesfile.org/Design/NamesAndConcepts.aspx.

Thanks for the link!  I will need to study this a bit more before I can
comment in detail.

> My proposed interpretation of "protonym" seems to fall within the
ambiguity of previous usage.  
> "Protonym" should be a nomenclatural term.  A misapplication remains based
on the same 
> protonym.  A name usage such as "n. sp. A" is outside the Code of
Nomenclature and cannot 
> be a protonym.

I have not been so strict in my definition of "Protonym".  We certainly
could agree to define it the way you have above (i.e., only in the context
of nomenclatural codes); but I would not be opposed to defining it more
generally as "the first usage of a name"; regardless of whether that name
falls within the realm of the Code.  For example, in my own implementations,
"n. sp. A" could, in fact, be anchored to its own Protonym.  One could even
extend it to include vernacular names as well, but I have not done this
myself (yet?).

> "Datanym" comes from one useful way to organize a database.  
> A datanym is the intersection between a name and a currently accepted
concept.  
> In this sense the "name" may be a protonym, or it may be a usage that does
not 
> qualify as a protonym.  Within a database it provides a useful way to
aggregate 
> name usages.  A misapplication creates a new datanym.  This facilitates
grouping 
> citations based on current concepts in the way most useful to the typical
user 
> of taxonomic data.  However, this means that datanyms can come and go with

> changes in taxon concepts.  It also means that if a type series includes
multiple 
> species, then that author has unintentionally created multiple datanyms.

I *think* I follow your concept of "datanym", but to be sure I'd need to see
a set of examples.

Here's the way I think of it:

NameString:  A string of characters (or symbols) intended to represent the
name of an organism.  This could represent *any* kind of name for an
organism (including vernaculars).

Reference: Some sort of documentation authored by humans.  Mostly these are
publications, but there is no reason they cannot extend well past grey
literature into things like correspondence, field notes, specimen labels, or
computer databases (among other examples).

Both of these terms are defined as broadly and generally as possible.
However, that doesn't mean they have to be *implemented* that way.  One can
always define one's own scope of interest to any subset of NameStrings and
References.  For example, in the context of this conversation, it might make
sense to refine the scope of NameStrings to be limited to scientific names
and partially scientific names.  By "partially scientific names", I mean
things like "n. sp. A", because they are essentially always used as a
surrogate for a scientific name, and almost always in conjunction with
proper scientific name (genus name or family name, for example).

As for References, we might restrict our scope to only those forms of
documentation represented by multiple copies (e.g., publications,
unpublished reports, etc.; excluding single-copy documents like specimen
labels and field notebooks).

The point is, we can restrict the scope of these terms any way we want for a
given context (conversation, database, etc.); while not preventing others
from implementing them more broadly.

That notwithstanding, I believe it is safe to say that all NameStrings of
interest to biologists only exist in the context of a Reference.  That is,
in order for a NameString to be worth contemplating, it must first exist in
some tangible form (e.g., paper or electronic).  Given this, one could infer
any biological meaning of these NameStrings, it is very useful to know the
Reference in which they appeared. In other words, if we really want to
understand what is meant by a NameString, it is important to understand the
context in which it was used.

Thus, the fundamental unit of interest to databases and indexes and such, is
what I now call the "TaxonNameUsage" (a slightly expanded definition of what
I referred to as an "Assertion" in the Taxonomer data model).

Very simply, a TaxonNameUsage is the application of a NameString within the
context of a Reference.

When implementing this as a database, there is a strong intuitive tendency
to create ID values for References, and ID Values for NameStrings, and then
create an index table that links one ReferenceID to one NameStringID.
However, I have resisted this temptation (for many reasons, which I won't go
into now).  Suffice it to say that I have found that assigning ID values to
NameStrings is both redundant, and unnecessarily complicated (in the sense
that it creates many unnecessary complications).  So, I assign ID values to
References, and ID values for TaxonNameUsage instances, and then NameString
is simply a property of the TaxonNameUsage instance.

In summarizing the above, I will (attempt to) bring it back to your original
message: With a sufficiently broad scope of "NameString", and sufficiently
broad scope of "Reference", you can accommodate all meaningful taxonomic
history through TaxonNameUsage instances.  Once can then define specific
subsets of TaxonNameUsage instances as falling into certain categories:

- Some represent original Code-compliant descriptions (creation events) for
scientific names
- Some represent assertions about taxonomic validity/synonymy about those
names
- Some represent handles to implied or explicitly defined taxonomic concepts
- etc., etc.

I *think* that all of the "events" you list on the web page could each be
anchored to a specific TaxonNameUsage instance. All relationships among
names (parent-child hierarchies, synonymies, homonymies, etc.) can all be
represented by qualified cross-references among TaxonNameUsage instances.
For example, we could establish a rule that all TaxonNameUsage instances
representing scientific names must be cross-referenced to one and only one
TaxonNameUsage instance of type "Protonym". This is independent of how
narrowly or broadly we decide to define the term "Protonym".

I *think* that this "TaxonNameUsage" instance falls reasonably close to what
you are calling a "Datanym", except I'm not sure what you mean when you say
"However, this means that datanyms can come and go with changes in taxon
concepts"

I try to avoid "maintenance-intensive" aspects to any database design; but
when I do include them, I always try to layer them on top of an
infrastructure of "fixed" instances.  Because TaxonNameUsage instances are
singular "events" in history, they have one and only one correct
representation for all future time; and once they are entered correctly,
they never need to be modified again.

For example, the scientific community may never agree that Aus bus is a
junior synonym of Aus xus; but Once Jones & Smith publish an article
asserting that Aus bus is a junior synonym of Aus xus, then the fact that
Jones $ Smith treated these names in this way will always be true.

> "Datanym" might be understood as a nebulous term that takes on 
> precise meaning only as defined in a particular data structure.  
> That is not my intent.  It does match the data structure in Species 
> File Software, and I hope that has not misled me into making a concept 
> useful only within a particular data structure.  There seems no need 
> to bother non-technical users of taxonomic information with the 
> concept of "datanym," but is it useful for communication among 
> people involved in bioinformatics and for data exchange?

Before I could offer my own opinion on an answer to that question, I would
need to see some examples, and or diagrams of your data structure to make
sure I understand exactly how they are used, and how they can come and go
with changes in taxon concepts.
	 
Apologies if the above was too long and/or unintelligible.

Also, David -- I know we have discussed a lot of this before, but I thought
it might be useful to others on this list.

Aloha,
Rich

Richard L. Pyle, PhD
Database Coordinator for Natural Sciences
  and Associate Zoologist in Ichthyology
Department of Natural Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef at bishopmuseum.org
http://hbs.bishopmuseum.org/staff/pylerichard.html




-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the Zoobank-list mailing list