| CENDI PRINCIPALS AND ALTERNATES MEETING |
National Library of Medicine
Bethesda, MD
April 21, 2000
WELCOME
Kurt Molholm, CENDI Chair, opened the meeting at 9:00 am. He thanked NLM for hosting the meeting.
ENVISIONING THE FUTURE: PART 2, NETWORKINGAs part of the CENDI Strategic Planning initiative, an emphasis has been placed on future information architectures. This began at the March CENDI meeting with an emphasis on infrastructure and architectures. The session continues with an emphasis on how the academic environment is addressing the future of scientific and technical research and communications.
"Internet-2: Changing the Way Science is Done"
Heather Boyles, Director, Government and International
Relations, Internet-2
Internet-2 is a consortium of universities dedicated to the advancement of telecommunications technologies in support of academic research. Its mission is to accelerate the availability of Internet technologies. The University Corporation for Advanced Internet Development was initiated in 1996 with about 36 organizations. There are currently over 170 universities, 64 corporate members, and some 37 other research organizations, including government and national laboratories. HPCC members are also members of the Internet2. Many of the 25 staff is on loan from the member universities. Ms. Boyles’ particular area of responsibility is government and international relationships. Affiliations have been developed with organizations in other countries that have similar missions, since the development and deployment of network applications and technologies that accelerate the creation of tomorrow’s Internet should be useful to the global Internet as well.
Affiliate memberships costs approximately $25,000 per year. Full university membership costs approximately $500,000. The degree to which universities are willing to give their own money toward this venture shows the importance of Internet technologies to the future learning and research environments.
There are several major efforts including creation of leading edge network capabilities; technology transfer; the development of new classes of applications, particularly middleware; and issues of quality of service. The consortium believes that applications should drive the development of the network and its services.
Often, the issue of band width is considered the most difficult obstacle for academic institutions and the issue of primary importance to the Internet-2 research. However, Ms. Boyles noted that many applications require high delays between send and response. This causes difficulty in resource allocation and can result in "time outs".
There are over 2000 universities and colleges in the U.S, and technology transfer is extremely important to ensure the broadest base of network service possible across the academic community. In response to a question about the "digital divide", Ms. Boyles indicated that technology transfer is supported by the corporate partners. To some extent, it is not Internet-2’s responsibility to bring along those who are behind in technology, but a number of these institutions are able to take advantage of the Internet-2’s innovative programs. For example, Internet-2 members have put in place a nationwide backbone network, which can be accessed by any academic institution regardless of membership, without having to pay the non-members fee. However, much of the work in developing Internet 2 is done on the local campuses. So, there are still issues if the campus-wide network isn’t able to handle the higher speed and if the equipment cannot implement the applications or the middleware.
Ms. Boyles believes that there have been quantitative and qualitative jumps in how we conduct research and how we engage in training and teaching. She described applications from both the hard sciences and humanities.
NASA’s Data Active Archive Centers and several NASA centers are using the new network capabilities. NexRAD radar sites collect data, the data is streamed directly to a simulation and made available directly on the network. In the life sciences area, 3-D Brain Mapping allows for mapping of brain scans to be done in real time and distributed globally for telemedicine.
While the hard sciences have driven many applications, some of the most taxing applications are from the humanities. Oklahoma University is doing a violin master class over video and audio. Stephen Spielberg’s SHOAH Holocaust Library has provided a variety of multimedia challenges. They have terabytes of data in storage and they’d like to make it widely available.
Other applications allow for the sharing of resources, such as supercomputers from distributed centers, as if they were in parallel. The Distributed Nanomanipulator allows for remote manipulation and telepresence. This is a way for organizations, particularly laboratories and medical facilities to share scarce and expensive equipment. Eventually, collaboratories can be developed so that there is not just one researcher and one instrument, but many researchers using several instruments in remote locations in real-time. With record and playback and chat sessions, the professors are able to involve graduate students in the actual collection of data and discussion about the meaning of findings.
Middleware applications are often used between the client and the server, but are still remote from the user. They have broad applicability across the network, addressing functions such as security and authentication. Discipline specific middleware is also being addressed. Medical middleware is very common and they are beginning to incorporate other discipline areas as well. Unfortunately, the commercial software industry is building pieces, but is not focused on interoperability. Interoperability is a key to development, even in middleware. Internet-2 hopes to provide interoperability guidelines and standards that industry can use.
In the area of quality of service and metrics, Q-Bone is the inter-network project for testing quality of service. It is based not on more bandwidth, but on more efficient use of the existing bandwidth. In addition to this specific project, one of the benefits for corporate members is that the Internet-2 consortium can provide a large-scale testbed for their ideas and products.
Discussion
Ms. Carroll asked what impact this type of research and the applications described might have on the traditional scientific publishing and communications processes. Ms. Boyles said that this isn’t clear yet, but certainly the results of work is more obvious if the tools are on the network. However, there is still a fundamental question as all of this moves to the network. How will you know what exists? Archiving of digital information is also another critical issue. Announcing the existence of this work and its importance and archiving this information is still a critical function for science, whether it is done through traditional or non-traditional communications processes. Ms. Boyles indicated these were important issues but no focused discussion is taking place within the community on them. However, SURA (Southeast University Research Associates) is looking at archival objects. Cliff Lynch is on their applications strategy panel and a discussion might be held with him.
The question was raised as to whether any middleware tools are being developed to better support the multi-lingual environment. Ms. Boyles noted that approximately 40 percent of the European Union’s (EU) communications research is in linguistics, but there is not much middleware development included in this research. The University of California at San Diego is working on a multilingual gateway.
Dr. Wood asked where research into end-to-end performance fits into the Internet-2 agenda. Ms. Boyles stated that there are a number of measurement and metrics activities underway, including those that are directly connected to each project. NSF has funded the NLANR project related to performance and middleware.
"Consumer Health Web Site: NLM’s MEDLINEplus"
Eve Marie Lacroix, Chief, Public Services Division, National
Library of Medicine
Once MEDLINE was made available to the public, NLM began to realize that 30 percent of the users of the system were from the general public. They studied the searches and found that 90 percent of the time, the users were searching for simple medical terms, primarily diseases, syndromes or surgical techniques. They also found a high incidence of the same type of information need expressed in questions received by the customer service staff. They also noticed that there are differences in language used by consumers. The variations have been considered so that jargon is avoided on the search screens as much as possible. However, the strengths of NLM searching by selecting and organizing the material are built in.
In October 1998, NLM began the site by developing the first 22 consumer health topics. The number has since been increased to 365. The MEDLINEplus service has been integrated with other products like the Clinical Trials database, which provides descriptions and contact information for trials that are recruiting participants. The service has been very successful with over 1M pages requested per month. Today, they are getting a lot of publicity. They recently got a call from NBC to see how they could connect.
MEDLINEplus provides links to selected resources under each topic. Selection guidelines were written and the product is expected to be selective, not comprehensive. NIH is filling in information where gaps exist. The selection of linked sites is based on the degree to which the site covers technical issues, the prestige of the directors of the organization (i.e., do they have a review board of technical peers), and how up-to-date the site is in relation to the topic(s) covered.
Ms. Lacroix noted that the MEDLINEplus does not advertise certain products nor is it sponsored by any commercial organization. It has often been complimented by users for this approach.
The basis for the system is a database of all the sites reviewed. Over 11,000 web pages have been reviewed, and the status of the review and the final decision about the web page are noted in the database. This provides tracking and also allows the staff to return to web pages that have already had some analysis, should it be necessary to add or replace sites. The combination of Cold Fusion and the Oracle Database allows for acquisitions and cataloging to be database-driven and performed remotely. Contractors perform the initial selection and cataloging at medical libraries and universities, but a second-level review is always conducted by NLM staff. New items are made available by creating HTML from the database records nightly. Having a database is the best alternative for manipulation of the data, however it is not as flexible as HTML files, since the database approach requires programming rather than just "tweaking" the HTML mark-up for a particular record that is displayed.
Key to the development and success of the service is the degree of attention to user requirements. Ms. Lacroix described the various ways in which the user needs were identified. A pilot project with 200 public libraries in five states was conducted. This helped to identify what public library users would want. Usability tests were also conducted. NLM staff analyzed the e-mail and phone logs from the customer support functions in order to identify frequently asked questions and searching problems. The results from a feedback loop were also evaluated, along with input from the institutes and staff. Private organizations and universities were asked to review the interface design and usability.
Several key problems were identified. First, users often use the syntax of other search engines with which they are familiar. NLM has attempted to accommodate the most common and to make the syntax work as the users expect without giving error messages. Misspellings are another problem. They use pick lists wherever possible to avoid mistakes based on direct keyboard entry.
The site is on a 6-month review schedule. NLM is currently reviewing the site for other possible resources of interest to their audiences. The US Pharmacopoeia is being licensed and will be made available in the next few weeks. They are also looking at newsfeeds and a medical dictionary. The library section will begin to gather links to other libraries that have consumer health information, beginning with the current Doc-User database of libraries in the national medical library network. NLM is also looking at terminology and metadata that will help a user select the organization and then select from among its publications or products. Different presentations are being discussed, including alphabetic and by category.
Another service may involve pre-formulated searches of MEDLINE. A terminology server will be developed at the Lister Hill Center to support the selection of proper terms. Also, users want to see the site in both Spanish and English.
As NLM has branched out to include other resources, problems have been encountered involving the licensing of these products. Many do not want NLM to frame their web pages, but NLM believes that this is important so that users do not lose the sense of where they are. They also acknowledge that it is hard to get precision with web search engines such as HTDIG. PubMed custom engines are much more precise. Also because the databases behind these products run on Oracle, it is necessary to license Oracle for remote use. Oracle is just beginning to understand how to price for open Web access.
Discussion
Ms. Lacroix was asked how many people support the system. There are approximately 15 FTEs involved, plus system support.
"From Here to There [And Back Again!]: Linking at NLM"
Kent Smith, Deputy Director, National Library of Medicine
Mr. Smith gave a variation of a presentation that he gave recently in Europe to the Association of Learned and Professional Society Publishers regarding reference linking. The main point that he made to the publishers was that the recent products from NLM, including PubMedCentral, are outgrowths of what the library has been doing for many years.
He described the history of MEDLINE. It began as MEDLARS in 1964 with batch searching. In 1971, the first online service was developed. There were about 250 journals indexed at that time and the maximum number of users was restricted to 25 simultaneous users. In the early 1990s MEDLINE was made available on the Web. Use of the system has increased 35-fold over the last three years with greater than 250M hits per year. PubRef is a citation matching system that was based on work done at the National Center for Biotechnology Information (NCBI) called PubMed. The URL or DOI for the full text is included in the record. PubRef has now been supplanted by the CrossRef system developed by a coalition of scholarly publishers.
PubMedCentral was set up to be a barrier-free, biomedical repository that accepts full text and supporting data. NLM provides the infrastructure for publishers’ contributions. The PubMed ID guarantees a unique and persistent identifier for each item. As appropriate, links are provided to sequence information (genomes, sequences, and structures), in addition to full text documents. Entrez is their software that integrates across their various systems.
Third-party providers and publishers now provide about 700 journal titles. It is possible to go from the publisher reference to MEDLINE or vice versa. A related books section has also been developed. A library can also define its holdings in PubMedCentral so that the user can obtain the journal locally, without going to the publisher site or using document ordering. Links will also be made to MEDLINEPlus, the consumer health site.
PubMedCentral allows authors and publishers to submit their material using the NLM infrastructure. Two journals are currently available. About 10 journals will be added soon. Copyright is retained by the original copyright holder. Submission can be made at any time during the publication’s process. An Advisory Committee reviews groups who want to submit material. Criteria for journal participations include: 1) at least 3 members of the editorial board for a journal must have been principal investigators on research projects in the field; and 2) the journal must be covered in one of several major abstracting and indexing services that select for coverage including BIOSIS, Chemical Abstracts, and MEDLINE. The material does not have to be formally peer-reviewed upon receipt. A distinction is made between items that have been traditionally peer-reviewed and those that have not. Archiving and coordination with international repositories will be handled by NLM.
While PubMedCentral has gained much publicity, there are other organizations providing similar services to publishers and authors. BioMedCentral is a product from the Current Science Group that is due to be released in May. E-BioSci from the European Molecular Biology Association is similar. It is just beginning and may have a more distributed system and some aspects of the service may cost money. Real free access to bioscience information is becoming a reality.