[ Back to Project Home ]
Competitive Evaluation of PURLs
Larry Stone, 22-Mar-00
This document is a statement of my evaluation of the
PURL
system developed by
the OCLC, as a technology to
support MIT's Persistent Naming
discovery project. Most of my information about PURLs comes from
close reading of material from the
www.purl.org
website itself and explication of PURLs in various published papers.
I have firsthand experience administering Handle System servers
and namespaces for the
NCSTRL
and
Theses Online projects.
I will compare PURLs to "Handle System" Handles to demonstrate why
Handles are the better choice for truly archival naming and experimentation
with other uses.
On the surface, PURLs and Handles have a lot in common. Both purport to
implement the URN concept. At the design and implementation level,
however, I believe PURL design is more simplistic and tied to current
technology standards that limit its longevity. It presently has some
operational and convenience advantages but sacrifices the archival
quality of its names.
The following text assumes some familarity with PURLs and Handles. Please
consult the relevant websites for background information if you are
not already familiar with them:
http://www.purl.org
http://www.handle.net
Similarities
-
Both implement a Uniform Resource Name -- a string of characters
that identifies a network-accessible resource regardless of its physical and
logical location on the Internet.
-
Both rely on indirection. Given a PURL or Handle, the user agent
must perform a lookup in the appropriate network service to get a URL --
the Uniform Resource Locator that it can use to retrieve the object itself.
-
Both claim to be compatible with the
IETF URN
standard, such as it is.
They do, in fact, implement the sort of service it describes.
- Both are in current, production (and commercial) use at sites other
than the originating organization.
Differences
- PURL is implemented as an HTTP redirection service. It derives its
"persistence" from the promise of the service provider that there will
always be an HTTP server at the given domain name capable of resolving
all of the PURLs for which it is responsible.
- PURLS thus depend on the survival of their host domain name. If that domain
were to change or vanish (e.g. if it ownership transferred to a
different organization or if the organization owning it ceased to
exist), the PURLs hosted there are in jeopardy. Handles, in contrast, can be homed
on any handle server and transferred at will -- so another organization
can take over handles for a merging or dying organization. Handle namespace names
are not related in any way to the hostname of their home server except
perhaps by coincidence.
- PURL relies on DNS and the Internet as it exists today, the Handle
System does not. The PURL system's persistence is thus impacted by the
persistence the domain name system itself.
Considering that the predecessor to DNS lasted less than 20 years, and
the Internet has undergone unprecedented explosive growth since the
deployment of DNS, it is not unreasonable to expect the nature of DNS, or
at least its already-controversial naming conventions, to change within the next few decades.
Especially when we also consider the growing commercial value of domain names..
- PURLs are resolved using the existing HTTP protocol so every HTTP client and Web
user agent is already capable of resolving PURLs. In fact, PURLs are
HTTP URLs. Handles, in contrast, are resolved through a different
protocol that is not so widespread. It requires special resolver software
in the client. Some Web browsers are capable of
resolving Handles directly (with a Handle-specific plugin).
For the rest, the Handle System HTTP proxy server turns Handles into HTTP URLs which
effectively gives them the same functionality as PURLs.
- A given PURL server can only resolve its "own" PURLs, since the
domain name is an integral part of the PURL. Any Handle System resolver
client (and any Handle HTTP proxy server) is capable of resolving
all Handles because the system has an internal hierarchy of
servers, like DNS.
-
Redundancy
is part of the design of the Handle System but it is a difficult kludge
in the PURL system.
PURL servers cannot be implicitly mirrored, except by using
"stupid DNS tricks" such as Akamai uses for load balancing. Handles
can be (and usually are) implicitly mirrored simply by deploying
multiple servers.
-
The source "URN" is inevitably lost when browsing PURLs, but not necessarily for Handles.
The HTTP "Redirect" method used to implement PURLs has the unfortunate
side-effect of hiding the PURL from the user; since a Web browser cannot
tell the difference between a PURL's redirect and a server using the
redirect for more prosaic reasons, it displays the destination URL as the
address of the web page instead of the PURL.
Handles viewed through the HTTP proxy have the same problem, although
browsers that grok Handle URIs directly will show the actual Handle.
-
PURLs only allow one resolution while Handles allow multiples.
It is not clear that this is an advantage :-) but Handles
can list multiple resolutions of one name. They might be alternate locations
of the same object, or different forms (e.g. PDF and PostScript format).
Because of inherent limitations in the redirection protocol, a PURL can only map to one URL.
Browsers do not handle multiple destinations gracefully. The Handle
proxy server generally presents an HTML page showing all choices, or
picks the "first" URL.
-
The PURL implementation retains a transaction history of changes made
to its database of mappings. Handle servers do not currently keep a
transaction log. This is an advantage of the PURL implementation that
ought to be emulated by Handles.
-
The Handle System is inherently more scalable than PURLs. The service for
a Handle namespace can be distributed, and broken up by sub-namespace,
among multiple server machines. The "home" of a namespace is invisible
to the user of a Handle resolver. A PURL namespace is bound to one host:
Since all of the PURLs in a single namespace must be resolved by the same
server (or at least a server at the same domain name), there is no clean way
to share the load. It is always possible to kludge something with
DNS-server magic to share the load but since it operates at the DNS level
it does not segment the namespace load; the PURL servers must be mirrors.
The PURL system is not inherently scalable -- any
adaptations for scalability are done outside of its design.
-
Handles can do more than resolve to URLs. The Handle System design
allows for various other types of resolution objects, metadata, and
extensible addtions to each Handle object record. PURLs are focussed on
resolving to URLs.
-
The PURL system has a more simplistic model of security than the
Handle System. Though neither supports authenticated and secure
access up to MIT standards,
Handles have more levels of authorization permissions and more flexible
access control lists per-object handle. A PURL server grants "write"
access to each principal for an entire domain of PURLs at a time.
Conclusions
Since the goal of this project is to attach truly archival, long-lived names
to network-accessible resources, I think PURLs should not be considered.
My primary objection is that PURLs rely on DNS for labeling namespaces, which
has at least two problems in the long run: DNS names are controlled by
outside agencies at many levels (i.e. not just local administrators, but
our ".EDU" parent domain is subject to the Internet governing bodies).
Also, I believe the entire DNS naming system will be revised within the
next 100 years, which is probably shorter than the range MIT Archives
routinely anticipates.
Although the Handle System currently needs the crutch of HTTP proxies
which have the same DNS naming problem, it is inherently free
of the domain name system and even the current Internet implementation.
The Handle namespace is not connected to any other protocol or standard,
because it was properly designed to persist as a meaningful, resolvable
naming system well into the forseeable future.
The comparison of PURLs to Handles is a worthwhile exercise, since it
illuminates some weaknesses of the present unfortunate need to use HTTP
redirection to resolve any type of "URN". It also inspires
more rational thinking of what an absolutely ideal persistent naming
system might be like, outside of the influence of the Handle
System.
References
- The PURL website,
http://www.purl.org
- W3C and Digital Libraries, by James S. Miller,
D-Lib magazine nov. 1996
www.dlib.org/dlib/november96/11miller.html
-
Best Practices for Digital Archiving
An Information Life Cycle Approach
by Gail M. Hodge, D-Lib January 2000
http://www.dlib.org/dlib/january00/01hodge.html
-
Naming and Addressing: URIs, URLs, ... by the W3 Consortium
(also includes pointers to appropriate RFCs for URN, etc.)
http://www.w3.org/Addressing/
-
The Myth of Names and Addresses, by Tim Berners-Lee, design note, at
http://www.w3.org/DesignIssues/NameMyth.html