[Zoobank-list] zoobank prototype

David Patterson dpatterson at mbl.edu
Tue Sep 12 16:07:57 BST 2006


Wolfgang’s recent posting is constructive, helpful, and 
interesting.  We now see the emergence of two different 
agendas.  One deals with what is now on display – let’s 
call it ZooBank 1 – and this is what Wolfgang addresses. 
  The other agenda is what we want ZooBank to become - a 
code-compliant names registry.  This is ZooBank 2 – and 
this is what Frank and Andrew have focused on.

I absolutely applaud the emergence of ZooBank – both 1 and 
2. We all owe a considerable debt to Andrew who has worked 
extremely hard to bring the operations of the Commission 
forward.

That said, the criticisms of ZooBank 1, such as through 
the Taxacom list serve, are fully justified.  The 
community out there has certain expectations of the 
Commission.  ZooBank 1 has not met those expectations.  We 
can point to our expectations of ZooBank 2, but a poor 
quality product does not serve us well and could push 
ZooBank 2 backwards.  We need to bring the community back 
on side, and I think Wolfgang’s posting helps us to chart 
out a path.

ZooBank 1 delivers a repackaging of Thomson’s products, so 
the problems derives from Thomson.  In April, I wrote to 
the then commissioners with my concerns about the 
association of ZooBank with a single commercial player, 
and all commissioners who responded to me agreed.  I 
attach my letter.

As one of the founders of the uBio project, many of the 
criticisms of ZooBank 1 are not unfamiliar.  uBio has 
received similar comments.  uBio has a different agenda to 
ZooBank 2.  I am not entirely sure what the agenda of 
ZooBank 1 is.  I am hoping that inter alia it will become 
the nomenclator of code-compliant names created 
previously, whereas ZooBank 2 becomes the nomenclator of 
code-compliant names of the future.  uBio’s objective has 
been to use names and hierarchies as informatics tools.

In building up NameBank, uBio’s activities overlap 
considerably with those of ZooBank 1 and 2 (and with 
Species 2000, ITIS, and many other players).  Those of us 
who seek to deliver known names, for whatever reason, face 
the same problems (such as importing mis-spellings, errors 
in authorship, chresonyms treated as name and authority 
combinations, undisambiguated homonyms, shifts in generic 
vehicles not identified as such, and so on).

Now I limit myself to the problems with ZooBank 1, 
although it should be clear that much of this is also 
relevant to NameBank 2.   If more than one initiative 
share the same problems, then it is prudent to work 
together to fix them.  Names are an objective layer of 
information (and, in the uBio view, metadata) upon which 
we all depend.  They are a communal resource for which we 
can have communal responsibility.

uBio has a very robust and flexible data model which has 
been developed on the basis that names need to be shared 
and that there will be problems to be addressed.  Many of 
the names problems identified by the Taxacom readership 
can be addressed within what we refer to as 
‘reconciliation’ groups.  These groups link together 
alternative names for the same taxon.  So, for example, 
Paramecium acudatum (a mis-spelling served by Thomson as a 
species) is assigned to the reconciliation group for 
Paramecium caudatum.  Similarly, the ‘species’ 'Paramecium 
cilia’ can be assigned to the reconciliation group 
‘Paramecium’.  Reconciliation groups can convert a query 
initiated with one name into an action that involves all 
names – such that a query on Paramecium caudatum will find 
the content labeled with the name Paramecium acudatum.

Within reconciliation groups, the status of each name can 
be identified.  Reconciliation groups not only fix 
problems but serve other needs of ZooBank by providing a 
means to indicate the nomenclatural status of names.

The creation of reconciliation groups requires experts to 
be involved.  An appropriate workbench, going beyond those 
offered through Platypus or ITIS, can allow experts to map 
names against one another (i.e. to create reconciliation 
groups), annotate names (to indicate their nomenclatural 
status etc.), and reshape hierarchical arrangements.  If 
we seek to be inclusive, we are obligated to adopt an 
approach that is able to deal with all names of all 
organisms, within a unified, authoritative, current 
hierarchy that can depict all subjective views.  This is 
referred to as a Union approach.

So extending Wolfgang’s suggestions, I believe ZooBank 1 
can be used to lead the development of a names-management 
co-operative that charts a path to serve the needs of many 
players.  This group should not only be concerned with 
animal names.  uBio obviously has tools and services to 
contribute.  There are numerous aggregators and hundreds 
of expert players out there who have expertise and 
concept.  ICZN has a clientele and a vision.  Thomson 
provides links into literature.  As the Biodiversity 
Heritage Library comes on line, so much more literature 
will become available.

Co-operation can be efficient.  A federated and communal 
environment that is held together through network services 
can extend benefits to all users.  This does not require a 
loss of ownership.  Editing environments can give the 
impression that names are being added to or corrected 
within one location (e.g. ZooBank) but in reality are 
happening at a deeper layer so that they can re-emerge to 
benefit any other users of names of animals.  Similarly, 
changes made to names elsewhere can flow to ZooBank.

Dealing with Thomson content was never identified as a 
priority for ZooBank – which was sold on the vision of 
ZooBank 2.  But the box has been opened and we need to do 
something about what has been released. Wolfgang’s 
suggestions of dealing separately with species, genera, 
and families makes sense as it will break up the challenge 
of dealing with the inherited problems into reasonably 
sized tasks.  A web based workbench will allow many tasks 
to be addressed simultaneously - one group can be creating 
a consensus list of carabid genera, while others correct 
the plethora of errors among ciliate species names.

Moving focus a little, some other services that serve the 
needs of the commission can be delivered very quickly. 
uBio can deliver a list of all generic names known to us 
(whether viral, plant, animal, prokaryotic, fungal or 
protist) – and can indicate who holds those names (so we 
can try to trace errors to their source).  A list of all 
genera will assist taxonomists in dealing with homonyms 
now.  A list of all genera will ease the informatics 
challenges of the future if it entices us to avoid 
voluntarily homonyms with any generic name whether of an 
animal or not.

None of this will be achieved without funding.  The cost 
of modifying the names-based infrastructure that we have 
been developing at the MBL to create the expert interface 
and the union editing environment is not great.  I believe 
that the system could be in place within 18 months 
assuming a single coder.  In respect of funding, any 
developments of ZooBank 1 and of ZooBank 2 will improve 
the services (market position and profitability) of the 
commercial agency (Thomson).  I would expect them to 
provide support to help bring the vision to fruition. 
 Perhaps GBIF also may be able to assist in making 
progress.

Although funding is needed for tools, the critical 
determinant to success with ZooBank 1 is how well we 
engage the taxonomic community and the numerous other 
initiatives.  Our performance there may determine the 
acceptance of the real ZooBank with the interested 
community.  Our workbench must impose a negligible extra 
load on the shoulders of the experts, and should return to 
them considerably more than they invest.  This is where 
the informatics prowess of uBio would serve the needs of 
Zoobank very well.  Contributors should have free access 
to tools and services that accelerate and enhance their 
own operations.  A simple example of a benefit would be a 
taxonomically informed alert system that delivers to users 
a customised weekly email to let them know of new names 
and combinations in their area of taxonomic interest, 
recent publications or additions to web sites, additions 
to the Biodiversity Heritage Library, GBIF or other 
on-line data providers changes and input into the 
underlying nomenclatural systems, etc.

We are no longer in the world of wishful thinking.  All of 
these things can happen with a relatively small 
investment.  The next step?  A meeting with 10-20 
interested players?

David Patterson
12th September 2006


My April letter to the commissioners follows:

Fellow commissioners

The ‘30’ email from Andrew contained a paragraph about 
ZooBank. You will probably know that I am a very strong 
advocate of moving nomenclatural activities forward, and 
believe that ZooBank is a great development. However, I am 
increasingly concerned about the potential of problems 
should ZooBank develop an exclusive relationship with a 
single commercial agency (Thomson). The large scientific 
publishers have, over the last decade, shown to us that 
their economic performance takes priority over the 
provision of service, and some have become excessively 
exploitational.  Some taxonomists refer to publishers as a 
new ‘taxonomic impediment’.  There are now significant 
political counter-moves to promote open access to publicly 
funded and other scientific content.

When we submitted our thoughts to Nature, we conceived of 
ZooBank as embedded within a community of complementary 
initiatives (and we specified a number of these, such as 
uBio, Species2000, ITIS and GBIF). By the time the 
technical paper was written this array had narrowed rather 
than expanded.  This reveals a shift in the wrong 
direction.

Thomson and Zoological Record do add considerable value to 
ZooBank, but it is not the only agency capable of 
enriching the initiative.  uBio (the project that I am 
associated with) offers a different but still 
complementary array of valuable assets.  It holds the 
contents of Nomenclator Zoologicus.  Having now passed 
through the 7.5 million record mark (making Thomson’s 
claim that ION with 1.7 million names is the most 
comprehensive organism names database rather inaccurate) 
we can access most of the homonyms that were created under 
other codes.  Significantly, we embed the names of 
organisms within a growing array of freely-available 
taxonomically intelligent services that not only help 
taxonomists do their work, but display the significance of 
nomenclature to a wider public.  GBIF too has many 
valuable dimensions, not the least of which is the growing 
coverage of specimens which form key connections between 
names and concepts. ITIS and Species2000 offer 
environments where we obtain taxonomic value that has been 
added around names.  New initiatives such as the 
Biodiversity Heritage Library will release vast amounts of 
taxonomically relevant literature and the names that are 
contained within this.  ZooBank should aim to be connected 
to all of these.

Given that the objective of publishers is to make money, 
we need to protect ZooBank from situations where the 
publisher exercises control over our capacity to submit 
names into ZooBank, or try to control – through financial 
charges – our access to names, or to the taxonomic 
descriptions or associated bibliographic information. 
 Publishers have already done this with the general 
scientific literature.  The financially weak position of 
the Trust does not make us equal players. It behooves 
ZooBank to protect itself against any complications that 
might emerge from an exclusive relationship with a 
commercial enterprise. Andrew can, of course, ask for 
contracts that guarantee free and open access to names and 
the key literature in which the names are embedded in 
perpetuity (maybe such contracts already exist, I am not 
privy to the dialog with Thomson).  However, I do not 
believe that such documents are sufficient.  Rather, I 
suggest we aim to mirror ZooBank operations at more than 
one location to break any threat of a monopoly.  Any 
resistance to this or to the movement of content would 
help us to identify those players who are in the game for 
the right reason.  I do not believe ZooBank would have any 
difficulty in attracting additional partners, or in 
getting GBIF to agree to occupying an overarching role in 
which they establish a taxonomic engine coupled with 
universal and unique persistent identifiers that will 
unify all names-based initiatives.  Each additional 
partner can be selected on the basis of the value that 
they add to ZooBank.
  





-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the Zoobank-list mailing list