| CENDI PRINCIPALS AND ALTERNATES MEETING |
National Library of Medicine
Bethesda, MD
July 16, 1996
"Beyond URL's: PURL/URI/URA Locator Initiatives"
Dr. Stuart Weibel, OCLC
Dr. Stuart Weibel, Senior Consulting Research Scientist at OCLC, described the problem of using Uniform Resource Locators (URL's) as the sole system for locating information on the WWW as analogous to naming a book by its shelf location. A URL is carried in digest pages and on people's hotlists and when the URL is changed or the site eliminated, these references are "broken". OCLC's experience shows that about 50% of the URL's created are deleted or renamed within the first six weeks after creation.
The IETF (Internet Engineering Task Force) URN (Uniform Resource Name) Task Group is looking for an ISBN or Social Security Number-type indicator for electronic resources so that the location or name can be separated from its permanent identifier. The ideal URN is persistent, globally unique, protocol-independent, and location independent.
The PURL (Persistent Uniform Resource Locator) is a persistent identifier based on a managed URL. An intermediate database is used with a simple table lookup to make the conversion from the previous URL to the new URL. The PURL is used in the links and hotlists. A PURL is persistent in that it doesn't change if the URL changes. An individual can abandon a PURL but the PURL is never deleted from the database. The follow-up on the broken links is a manual effort. In OCLC's InterCat project, a project for cataloging WWW sites, a message is sent to the library responsible for cataloging the WWW site when a link is broken. (OCLC provides utilities for identifying broken links.)
Unlike other proposed locator schemes, the PURL uses the basic URL structure, retains the 'http' and DNS protocols, and it works by using the standard 'http' redirect capabilities. The PURL is also based on the standard computer technique of "indirection"; i.e., giving something a symbolic name.
PURL resolution results in twice the number of network transmissions, but not twice the traffic. The PURL executes another network transmission for the resolved address URL, but the results come only from the final URL.
The name portion of the PURL is not globally unique. It can become unique if organizations agree to federate their servers and use Handle server generation of unique names. OCLC has an agreement with the Corporation for National Research Initiatives (CNRI), the developers of the Handle system, to develop a PURL-Handle Resolver system.
The PURL requires the installation of a PURL server. This server is based on the standard National Center for Supercomputing Applications (NCSA) 'http' server software so it can be distributed without licensing problems. OCLC is distributing the server software and utilities free of charge to organizations and individuals who want to put up their own PURL servers. A PURL server requires running an 'http' server and then adding the PURL layer . Currently, there are three PURL servers, including Stanford University, OCLC, and CNRI.
There are several alternatives to the PURL scheme. The URN (Uniform Resource Name) is being worked on by a committee of the IETF. Dr. Weibel believes that consensus will be achieved and the URN will be specified within the next three to four years. Another alternative scheme is the URC (Uniform Resource Characteristic), which provides "metadata" about the site. However, these shcemes require new standards consensus, changes to current browsers, and cannot be seamlessly implemented in the current Internet (URL) environment.
The PURL project is still under the auspices of the OCLC Office of Research. From January to June, 1996 there were 6,500 PURL's assigned by the OCLC server. Approximately 435,000 resolutions were performed by 10,600 users.
Dr. Weibel emphasized that running a PURL server requires an obligation to maintain the server. The successful use of PURL is not a technology issue (the tools for monitoring and maintaining are available), but an organizational and information management issue. Dr. Weibel believes libraries and government information organizations that produce and provide access to archival information should be committed to persistent naming of WWW sites.
Additional information is available from http://purl.oclc.org.
NLM Developments-Management Overview: Reinvention Laboratory
Update
Kent Smith
Kent Smith, Deputy Director of the National Library of Medicine, provided the management perspective on the NLM Reinvention Laboratory projects. The NPR (National Performance Review) reinvention laboratory opportunity came at the same time as the need to transition from NLM's legacy systems to new systems.
NLM is midway through the project and on schedule for the August 1998 deadline. There are concurrent and overlapping efforts. The NLM mandate will be continued and enhanced to include expansion into electronic formats. A single logical database will be provided with the ability to search multiple physical databases. The Medical Subject Headings (MeSH) thesaurus will continue to be used with the addition of the Unified Medicine Language System (UMLS) (http://www.nlm.nih.gov/pubs/factsheets/umls.html). Disparate data will be linked requiring increased links, relationships, and standardization. Automated and semi-automated indexing is being investigated.
PUBMED is another project that was undertaken as part of the Reinvention Laboratory. PUBMED makes a link between MEDLINE citations and the homepages of the journal publishers. The connection may be at the level of the publisher's homepage or the actual article in an electronic journal. There will also be links back from the article and its bibliography to MEDLINE citations. PUBMED is currently being piloted by the National Center for Biotechnology Information (NCBI) located at NLM's Lister Hill Center. See PubMed.
NLM Developments - Internet Grateful Med
Dr. Lawrence Kingsland
Internet Grateful Med (http://igm.nlm.nih.gov/) was announced in mid-February, 1996 as the first major development in NLM's system reconnection project. By mid-July, 1996 there had been 16,000-18,000 searches by 2,600-2,700 users.
The major features of Internet Grateful Med include:
The developers were concerned about the slow response for users with PPP or low speed modem connections. Therefore, the number and size of graphics and icons was kept to a minimum.
The types of limitations that could be added to qualify a search (year, language, etc.) were based on previous statistics on what searchers use.
Key to the Internet Grateful Med is the application of the UMLS metathesaurus. Searching can be based on co-occurrence of terms; qualifiers can be applied based on the UMLS definitions; and UMLS definitions can be presented. Aid is provided when a search results in zero hits.
Users must register for Internet Grateful Med and can do so online. Dr. Kingsland demonstrated the Internet Grateful Med software and presented several types of aids available to the users. Enhancements to Internet Grateful Med based on user comments are already being planned.
NLM Developments - Visible Human Project
Dr. Michael Ackerman
The work on the Visible Human Project began in 1985-1986 with a long range plan. The plan called for three types of coincidental images (cat scan, MRI, and a cryosection) on a normal male and normal female. A committee of experts reviewed the images to find the normal body.
Visual images of the project were presented. They showed how the data were developed and the kinds of applications that will be made of the data. Projected uses for the image dataset included virtual reality of surgical practice for both students and patients and forensics training. The information is being made available to private "publishers". There are currently over 500 licensees in 26 countries. See www.nlm.nih.gov/research/visible/visible_human.html.
G-7 Initiatives in the Health Arena
Dr. Donald Lindberg
Dr. Donald Lindberg, Director of the National Library of Medicine, gave an update on G-7 initiatives in the health arena. G-7 is a group made up of the seven richest countries in the world. Some of the projects under the health care area include a Global Public Health Network; a 24-hour, worldwide telemedicine system; development of SmartCards for health benefits; and a Universal Disease Reporting System.
A U.S. G-7 homepage is available at http://nii.nist.gov/g7/g7-G1P.html.
Optical Scanning Technologies and Projects
Gary Craig
Gary Craig, Director of the Federal Intelligent Document Understanding Laboratory (FIDUL), described the origins of the organization and the research it conducts. FIDUL grew out of the Office of R&D at the CIA. ORD had the same problems as other R&D units -- problems transferring the results of R&D into tangible benefits. The emphasis at FIDUL is on technical assessment programs, including usability testing, forming partnerships, and end user involvement.
FIDUL has done research on the OCR process including end to end systems. Foreign language capabilities are emphasized because of their importance to the intelligence community.
Current projects include "Cuneiform," a commercial Cyrillic OCR engine, a Chinese OCR engine for the National Air Intelligence Center (NAIC), and a portable document understanding system.