Handle Project: Competitive Evaluation of PURLs

Competitive Evaluation of PURLs

Larry Stone, 22-Mar-00

This document is a statement of my evaluation of the PURL system developed by the OCLC, as a technology to support MIT's Persistent Naming discovery project. Most of my information about PURLs comes from close reading of material from the www.purl.org website itself and explication of PURLs in various published papers. I have firsthand experience administering Handle System servers and namespaces for the NCSTRL and Theses Online projects. I will compare PURLs to "Handle System" Handles to demonstrate why Handles are the better choice for truly archival naming and experimentation with other uses.

On the surface, PURLs and Handles have a lot in common. Both purport to implement the URN concept. At the design and implementation level, however, I believe PURL design is more simplistic and tied to current technology standards that limit its longevity. It presently has some operational and convenience advantages but sacrifices the archival quality of its names.

The following text assumes some familarity with PURLs and Handles. Please consult the relevant websites for background information if you are not already familiar with them:

http://www.purl.org
http://www.handle.net

Similarities

Both implement a Uniform Resource Name -- a string of characters that identifies a network-accessible resource regardless of its physical and logical location on the Internet.
Both rely on indirection. Given a PURL or Handle, the user agent must perform a lookup in the appropriate network service to get a URL -- the Uniform Resource Locator that it can use to retrieve the object itself.
Both claim to be compatible with the IETF URN standard, such as it is. They do, in fact, implement the sort of service it describes.
Both are in current, production (and commercial) use at sites other than the originating organization.

Differences

PURL is implemented as an HTTP redirection service. It derives its "persistence" from the promise of the service provider that there will always be an HTTP server at the given domain name capable of resolving all of the PURLs for which it is responsible.
PURLS thus depend on the survival of their host domain name. If that domain were to change or vanish (e.g. if it ownership transferred to a different organization or if the organization owning it ceased to exist), the PURLs hosted there are in jeopardy. Handles, in contrast, can be homed on any handle server and transferred at will -- so another organization can take over handles for a merging or dying organization. Handle namespace names are not related in any way to the hostname of their home server except perhaps by coincidence.
PURL relies on DNS and the Internet as it exists today, the Handle System does not. The PURL system's persistence is thus impacted by the persistence the domain name system itself. Considering that the predecessor to DNS lasted less than 20 years, and the Internet has undergone unprecedented explosive growth since the deployment of DNS, it is not unreasonable to expect the nature of DNS, or at least its already-controversial naming conventions, to change within the next few decades. Especially when we also consider the growing commercial value of domain names..
PURLs are resolved using the existing HTTP protocol so every HTTP client and Web user agent is already capable of resolving PURLs. In fact, PURLs are HTTP URLs. Handles, in contrast, are resolved through a different protocol that is not so widespread. It requires special resolver software in the client. Some Web browsers are capable of resolving Handles directly (with a Handle-specific plugin). For the rest, the Handle System HTTP proxy server turns Handles into HTTP URLs which effectively gives them the same functionality as PURLs.
A given PURL server can only resolve its "own" PURLs, since the domain name is an integral part of the PURL. Any Handle System resolver client (and any Handle HTTP proxy server) is capable of resolving all Handles because the system has an internal hierarchy of servers, like DNS.
Redundancy is part of the design of the Handle System but it is a difficult kludge in the PURL system. PURL servers cannot be implicitly mirrored, except by using "stupid DNS tricks" such as Akamai uses for load balancing. Handles can be (and usually are) implicitly mirrored simply by deploying multiple servers.
The source "URN" is inevitably lost when browsing PURLs, but not necessarily for Handles. The HTTP "Redirect" method used to implement PURLs has the unfortunate side-effect of hiding the PURL from the user; since a Web browser cannot tell the difference between a PURL's redirect and a server using the redirect for more prosaic reasons, it displays the destination URL as the address of the web page instead of the PURL. Handles viewed through the HTTP proxy have the same problem, although browsers that grok Handle URIs directly will show the actual Handle.
PURLs only allow one resolution while Handles allow multiples. It is not clear that this is an advantage :-) but Handles can list multiple resolutions of one name. They might be alternate locations of the same object, or different forms (e.g. PDF and PostScript format). Because of inherent limitations in the redirection protocol, a PURL can only map to one URL. Browsers do not handle multiple destinations gracefully. The Handle proxy server generally presents an HTML page showing all choices, or picks the "first" URL.
The PURL implementation retains a transaction history of changes made to its database of mappings. Handle servers do not currently keep a transaction log. This is an advantage of the PURL implementation that ought to be emulated by Handles.
The Handle System is inherently more scalable than PURLs. The service for a Handle namespace can be distributed, and broken up by sub-namespace, among multiple server machines. The "home" of a namespace is invisible to the user of a Handle resolver. A PURL namespace is bound to one host: Since all of the PURLs in a single namespace must be resolved by the same server (or at least a server at the same domain name), there is no clean way to share the load. It is always possible to kludge something with DNS-server magic to share the load but since it operates at the DNS level it does not segment the namespace load; the PURL servers must be mirrors. The PURL system is not inherently scalable -- any adaptations for scalability are done outside of its design.
Handles can do more than resolve to URLs. The Handle System design allows for various other types of resolution objects, metadata, and extensible addtions to each Handle object record. PURLs are focussed on resolving to URLs.
The PURL system has a more simplistic model of security than the Handle System. Though neither supports authenticated and secure access up to MIT standards, Handles have more levels of authorization permissions and more flexible access control lists per-object handle. A PURL server grants "write" access to each principal for an entire domain of PURLs at a time.

Conclusions

Since the goal of this project is to attach truly archival, long-lived names to network-accessible resources, I think PURLs should not be considered. My primary objection is that PURLs rely on DNS for labeling namespaces, which has at least two problems in the long run: DNS names are controlled by outside agencies at many levels (i.e. not just local administrators, but our ".EDU" parent domain is subject to the Internet governing bodies). Also, I believe the entire DNS naming system will be revised within the next 100 years, which is probably shorter than the range MIT Archives routinely anticipates.

Although the Handle System currently needs the crutch of HTTP proxies which have the same DNS naming problem, it is inherently free of the domain name system and even the current Internet implementation. The Handle namespace is not connected to any other protocol or standard, because it was properly designed to persist as a meaningful, resolvable naming system well into the forseeable future.

The comparison of PURLs to Handles is a worthwhile exercise, since it illuminates some weaknesses of the present unfortunate need to use HTTP redirection to resolve any type of "URN". It also inspires more rational thinking of what an absolutely ideal persistent naming system might be like, outside of the influence of the Handle System.

References

The PURL website, http://www.purl.org
W3C and Digital Libraries, by James S. Miller, D-Lib magazine nov. 1996 www.dlib.org/dlib/november96/11miller.html
Best Practices for Digital Archiving An Information Life Cycle Approach by Gail M. Hodge, D-Lib January 2000 http://www.dlib.org/dlib/january00/01hodge.html
Naming and Addressing: URIs, URLs, ... by the W3 Consortium (also includes pointers to appropriate RFCs for URN, etc.) http://www.w3.org/Addressing/
The Myth of Names and Addresses, by Tim Berners-Lee, design note, at http://www.w3.org/DesignIssues/NameMyth.html