[Zoobank-list] zoobank prototype
David Patterson
dpatterson at mbl.edu
Tue Sep 12 16:07:57 BST 2006
Wolfgangs recent posting is constructive, helpful, and
interesting. We now see the emergence of two different
agendas. One deals with what is now on display lets
call it ZooBank 1 and this is what Wolfgang addresses.
The other agenda is what we want ZooBank to become - a
code-compliant names registry. This is ZooBank 2 and
this is what Frank and Andrew have focused on.
I absolutely applaud the emergence of ZooBank both 1 and
2. We all owe a considerable debt to Andrew who has worked
extremely hard to bring the operations of the Commission
forward.
That said, the criticisms of ZooBank 1, such as through
the Taxacom list serve, are fully justified. The
community out there has certain expectations of the
Commission. ZooBank 1 has not met those expectations. We
can point to our expectations of ZooBank 2, but a poor
quality product does not serve us well and could push
ZooBank 2 backwards. We need to bring the community back
on side, and I think Wolfgangs posting helps us to chart
out a path.
ZooBank 1 delivers a repackaging of Thomsons products, so
the problems derives from Thomson. In April, I wrote to
the then commissioners with my concerns about the
association of ZooBank with a single commercial player,
and all commissioners who responded to me agreed. I
attach my letter.
As one of the founders of the uBio project, many of the
criticisms of ZooBank 1 are not unfamiliar. uBio has
received similar comments. uBio has a different agenda to
ZooBank 2. I am not entirely sure what the agenda of
ZooBank 1 is. I am hoping that inter alia it will become
the nomenclator of code-compliant names created
previously, whereas ZooBank 2 becomes the nomenclator of
code-compliant names of the future. uBios objective has
been to use names and hierarchies as informatics tools.
In building up NameBank, uBios activities overlap
considerably with those of ZooBank 1 and 2 (and with
Species 2000, ITIS, and many other players). Those of us
who seek to deliver known names, for whatever reason, face
the same problems (such as importing mis-spellings, errors
in authorship, chresonyms treated as name and authority
combinations, undisambiguated homonyms, shifts in generic
vehicles not identified as such, and so on).
Now I limit myself to the problems with ZooBank 1,
although it should be clear that much of this is also
relevant to NameBank 2. If more than one initiative
share the same problems, then it is prudent to work
together to fix them. Names are an objective layer of
information (and, in the uBio view, metadata) upon which
we all depend. They are a communal resource for which we
can have communal responsibility.
uBio has a very robust and flexible data model which has
been developed on the basis that names need to be shared
and that there will be problems to be addressed. Many of
the names problems identified by the Taxacom readership
can be addressed within what we refer to as
reconciliation groups. These groups link together
alternative names for the same taxon. So, for example,
Paramecium acudatum (a mis-spelling served by Thomson as a
species) is assigned to the reconciliation group for
Paramecium caudatum. Similarly, the species 'Paramecium
cilia can be assigned to the reconciliation group
Paramecium. Reconciliation groups can convert a query
initiated with one name into an action that involves all
names such that a query on Paramecium caudatum will find
the content labeled with the name Paramecium acudatum.
Within reconciliation groups, the status of each name can
be identified. Reconciliation groups not only fix
problems but serve other needs of ZooBank by providing a
means to indicate the nomenclatural status of names.
The creation of reconciliation groups requires experts to
be involved. An appropriate workbench, going beyond those
offered through Platypus or ITIS, can allow experts to map
names against one another (i.e. to create reconciliation
groups), annotate names (to indicate their nomenclatural
status etc.), and reshape hierarchical arrangements. If
we seek to be inclusive, we are obligated to adopt an
approach that is able to deal with all names of all
organisms, within a unified, authoritative, current
hierarchy that can depict all subjective views. This is
referred to as a Union approach.
So extending Wolfgangs suggestions, I believe ZooBank 1
can be used to lead the development of a names-management
co-operative that charts a path to serve the needs of many
players. This group should not only be concerned with
animal names. uBio obviously has tools and services to
contribute. There are numerous aggregators and hundreds
of expert players out there who have expertise and
concept. ICZN has a clientele and a vision. Thomson
provides links into literature. As the Biodiversity
Heritage Library comes on line, so much more literature
will become available.
Co-operation can be efficient. A federated and communal
environment that is held together through network services
can extend benefits to all users. This does not require a
loss of ownership. Editing environments can give the
impression that names are being added to or corrected
within one location (e.g. ZooBank) but in reality are
happening at a deeper layer so that they can re-emerge to
benefit any other users of names of animals. Similarly,
changes made to names elsewhere can flow to ZooBank.
Dealing with Thomson content was never identified as a
priority for ZooBank which was sold on the vision of
ZooBank 2. But the box has been opened and we need to do
something about what has been released. Wolfgangs
suggestions of dealing separately with species, genera,
and families makes sense as it will break up the challenge
of dealing with the inherited problems into reasonably
sized tasks. A web based workbench will allow many tasks
to be addressed simultaneously - one group can be creating
a consensus list of carabid genera, while others correct
the plethora of errors among ciliate species names.
Moving focus a little, some other services that serve the
needs of the commission can be delivered very quickly.
uBio can deliver a list of all generic names known to us
(whether viral, plant, animal, prokaryotic, fungal or
protist) and can indicate who holds those names (so we
can try to trace errors to their source). A list of all
genera will assist taxonomists in dealing with homonyms
now. A list of all genera will ease the informatics
challenges of the future if it entices us to avoid
voluntarily homonyms with any generic name whether of an
animal or not.
None of this will be achieved without funding. The cost
of modifying the names-based infrastructure that we have
been developing at the MBL to create the expert interface
and the union editing environment is not great. I believe
that the system could be in place within 18 months
assuming a single coder. In respect of funding, any
developments of ZooBank 1 and of ZooBank 2 will improve
the services (market position and profitability) of the
commercial agency (Thomson). I would expect them to
provide support to help bring the vision to fruition.
Perhaps GBIF also may be able to assist in making
progress.
Although funding is needed for tools, the critical
determinant to success with ZooBank 1 is how well we
engage the taxonomic community and the numerous other
initiatives. Our performance there may determine the
acceptance of the real ZooBank with the interested
community. Our workbench must impose a negligible extra
load on the shoulders of the experts, and should return to
them considerably more than they invest. This is where
the informatics prowess of uBio would serve the needs of
Zoobank very well. Contributors should have free access
to tools and services that accelerate and enhance their
own operations. A simple example of a benefit would be a
taxonomically informed alert system that delivers to users
a customised weekly email to let them know of new names
and combinations in their area of taxonomic interest,
recent publications or additions to web sites, additions
to the Biodiversity Heritage Library, GBIF or other
on-line data providers changes and input into the
underlying nomenclatural systems, etc.
We are no longer in the world of wishful thinking. All of
these things can happen with a relatively small
investment. The next step? A meeting with 10-20
interested players?
David Patterson
12th September 2006
My April letter to the commissioners follows:
Fellow commissioners
The 30 email from Andrew contained a paragraph about
ZooBank. You will probably know that I am a very strong
advocate of moving nomenclatural activities forward, and
believe that ZooBank is a great development. However, I am
increasingly concerned about the potential of problems
should ZooBank develop an exclusive relationship with a
single commercial agency (Thomson). The large scientific
publishers have, over the last decade, shown to us that
their economic performance takes priority over the
provision of service, and some have become excessively
exploitational. Some taxonomists refer to publishers as a
new taxonomic impediment. There are now significant
political counter-moves to promote open access to publicly
funded and other scientific content.
When we submitted our thoughts to Nature, we conceived of
ZooBank as embedded within a community of complementary
initiatives (and we specified a number of these, such as
uBio, Species2000, ITIS and GBIF). By the time the
technical paper was written this array had narrowed rather
than expanded. This reveals a shift in the wrong
direction.
Thomson and Zoological Record do add considerable value to
ZooBank, but it is not the only agency capable of
enriching the initiative. uBio (the project that I am
associated with) offers a different but still
complementary array of valuable assets. It holds the
contents of Nomenclator Zoologicus. Having now passed
through the 7.5 million record mark (making Thomsons
claim that ION with 1.7 million names is the most
comprehensive organism names database rather inaccurate)
we can access most of the homonyms that were created under
other codes. Significantly, we embed the names of
organisms within a growing array of freely-available
taxonomically intelligent services that not only help
taxonomists do their work, but display the significance of
nomenclature to a wider public. GBIF too has many
valuable dimensions, not the least of which is the growing
coverage of specimens which form key connections between
names and concepts. ITIS and Species2000 offer
environments where we obtain taxonomic value that has been
added around names. New initiatives such as the
Biodiversity Heritage Library will release vast amounts of
taxonomically relevant literature and the names that are
contained within this. ZooBank should aim to be connected
to all of these.
Given that the objective of publishers is to make money,
we need to protect ZooBank from situations where the
publisher exercises control over our capacity to submit
names into ZooBank, or try to control through financial
charges our access to names, or to the taxonomic
descriptions or associated bibliographic information.
Publishers have already done this with the general
scientific literature. The financially weak position of
the Trust does not make us equal players. It behooves
ZooBank to protect itself against any complications that
might emerge from an exclusive relationship with a
commercial enterprise. Andrew can, of course, ask for
contracts that guarantee free and open access to names and
the key literature in which the names are embedded in
perpetuity (maybe such contracts already exist, I am not
privy to the dialog with Thomson). However, I do not
believe that such documents are sufficient. Rather, I
suggest we aim to mirror ZooBank operations at more than
one location to break any threat of a monopoly. Any
resistance to this or to the movement of content would
help us to identify those players who are in the game for
the right reason. I do not believe ZooBank would have any
difficulty in attracting additional partners, or in
getting GBIF to agree to occupying an overarching role in
which they establish a taxonomic engine coupled with
universal and unique persistent identifiers that will
unify all names-based initiatives. Each additional
partner can be selected on the basis of the value that
they add to ZooBank.
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the Zoobank-list
mailing list