| CENDI PRINCIPALS AND ALTERNATES MEETING |
National Library of Education
Washington, D.C.
March 14, 2000
WELCOME
Kurt Molholm, CENDI Chair, opened the meeting at 9:00 am. He thanked NLE for hosting the meeting. Ms. Carroll reported that Christine Dolan was unable to attend but that she hopes to join CENDI at a future meeting.
ENVISIONING THE FUTURE: EVOLUTION OR REVOLUTION?
As part of the CENDI Strategic Planning initiative, an emphasis has been placed on future information architectures. In order to better understand the environment in which the agencies and their customers will be functioning in the future, an expert panel has been asked to speak about their visions of the future, particularly as it involves the management of information for scientific and technical research and communications.
The Role of Architecture in Information Management: A
Framework for Managing Access to Digital Information
Dr. Robert Kahn, President,
Corporation for National Research Initiatives (CNRI)
The information architecture of the future will be marked by a confluence of technologies, including a framework for digital objects, key aspects of distributed database access, semantics and knowledge-based systems, rights management, and multiple user interfaces. These technologies will operate with common repository designs, federated testbeds, and metadata standards.
Given these components, how will they work together to provide a seamless, heterogeneous environment? There is minimal information needed for a Digital Object to be managed and integrated with other information technologies and with legal and rights management systems. Agreeing on the minimal information is the key to bridging these various technologies. The user interface will be increasingly important. It may include knowledge-based systems and agents that can support resource discovery, as well as natural language processing and semantics. In terms of policies, privacy will become a pervasive issue across all the components.
If one looks at the history of the development of standards for the technical aspects of the Internet, there are similarities that can inform the architecture needed for information management. The objective for the network framework was "best effort packet delivery " in a federated environment. The objective of information systems is seamless interoperability, which is not as easily addressed as packet delivery. Instead, there are multiple questions and answers; the objective is really unbounded.
Ten years ago, CNRI received a grant from DARPA to create a means of federating digital libraries of documents. Even though a document is only one type of object in the broader class of digital objects, the federating of document repositories provides some examples that have broader applicability. In the digital library example, the first objective was to allow any material stored anywhere to be accessible in the same way as local material. Manifesting the material should be done without any fanfare. If the material and the supporting system is managed, there should be no limitations on the time frame in which the material can be accessed. The framework addressed searching for documents and creating them, but the exact mechanism was not defined. The final objective of the framework was to encourage third-party, value-added services.
Within this framework, CNRI worked with DARPA and the Library of Congress. The focus was on the Copyright Office itself. CNRI built a system for registering for copyright. This has developed into a non-exclusive agreement with UMI for the submission of Ph.D. theses to the registration process. CNRI is also involved with development of a system for music publishers that will be made available soon.
Parallels can be drawn between the components of the Internet infrastructure model and the management of information objects. The packet is equivalent to the digital object. In the broadest sense, a digital object is anything on which you want to transact business. The IP address serves the same unique identification function as the Handle. The gateway router is an interface; the equivalent in the management of information is the repository as an interface. The end-to-end protocol is matched by the structured data and authentication. In the Internet communications architecture, there are senders and receivers. In information management, there are stated operations. In addition, both the network and content management architectures are intended to be independent of specific applications or hardware/software platforms. Neither defines any specific network or information system. Specifications are open with defined interfaces.
The Handle and its Digital Object Identifier (DOI) implementation are like the DNS system in that they function as resolver registries. However, Dr. Kahn stated that if he were to design the DNS system now, he would not build it as a hierarchical system. While many implementations of the Handle technology have created Handle strings that have meaning, the technology does not require it and is based on meaningless strings without hierarchy.
DOIs are also inherently more complex than the DNS technology, because digital objects need more information connected to them to make them meaningful. Stated operations must go along with each object. There is a need for authority systems, indirection (the use of an intermediate level digital object to provide indirect referencing as in a catalog), and other means of rapid lookup. Therefore, metadata is needed in DOI systems. CNRI is currently building the metadata system for CrossRef.
There is ambiguity about Internet information management. Some people believe that information should be free and that no standards for organization should be imposed because they are futile implement. Others believe that rights management is necessary and that all the energy being expended on the Internet can only be leveraged through structure and organization. Dr. Kahn, who has been involved with Internet standards development since the beginning, pointed out that underneath the Internet there are a small number of standards to provide the cohesion that make these networks work together. While there were some interesting social mechanisms that have gotten us where we are today, there are also a few lynch pin standards B IP addresses, TCP-IP protocols, etc. B that are so fundamental, we don't even think about them anymore. We need something like this on the information management side.
We can expect that the environment of the future will be built and usable to some extent without extensive information management. The infrastructure must be flexible enough to accommodate changing forms of creativity. The architecture must be open enough to allow for third-party, value-added services to be encouraged. The initial queries will not be complex, because the time to resolve queries increases as the complexity of the query increases.
The new architecture doesn't really require standards bodies in the sense that they exist today through international organizations. It is possible in this infrastructure to register a structure, describe it, and program the system to manifest the object correctly. "Extensible data types " will allow organizations and subgroups to add to standardized structures. However, it is likely that some structures will become more prevalent than others. The type.value structure is logically structured as a typed pair. This serves as the common currency across different repositories.
The access to a repository is equivalent to giving the kitchen an order to prepare something for you. The service request may be standard or customized. The copy and distribution of physical objects requires interactions with the contract system. In the digital environment, physical objects will be manifestations of the individual digital components parts. This concept has an impact on archiving and librarianship. For example, CNRI has argued with the Library of Congress that it does not need to keep the physical object any longer, as long as it is available in electronic form from a single party.
The architecture of the future will continue to be a distributed one. Even the Handle system does not have to reside in a central system. The resolution system is distributed and redundancy is built in. Local servers can be bootstrapped together, and both local and global systems are possible. Caches can be maintained locally, so that there is no need to access a remote resolver every time. Local handle servers can intercept and filter those items for which the users have alternate access. For example, if a library has a special licensing agreement, it can intercept and route the request to the local holdings rather than to a pay-per-view version from the publisher's site. The use of agents will support intellectual property. As in the traffic system model, signs can nominally be advisory in nature. Fair use can be handled in this way.
Open issues still exist within this future information management architecture. For example, how do you access older manifestations as software and hardware change? What are the necessary standards for preservation? What metadata is needed for content versus the network functionality? But despite these open issues, Dr. Kahn believes that this architecture can ensure usability and interoperability into the future.
Discussion
Mr. Molholm described the potential benefits they have seen at DTIC from the use of the Handle system. These include a single point to hold multiple passwords, archiving, and the lack of inherent structure at the object level. However, several members noted that many industry groups are working on similar projects, often starting the process as one implementation and then making additional changes. Dr. Kahn indicated that there is much similarity and that minimal standards should support a federated repository. The question of security of the centralized Handle system was raised. Dr. Kahn indicated that Handles are also captured in the local system, as well as in global systems. The official system is actually the local system.
Knowledge Environments and Scientific Progress
Monica
Bradford, Managing Editor, Science Magazine
Science Magazine's project in Knowledge Environments is an attempt to serve as an intermediary between various research communities. The development of Knowledge Environments is a significant departure from previous Science Online developments, which were still closely linked to the publication of Science Magazine. The concept is to develop new tools for information management. The goal is to leverage online technologies to systematically link related material, while requiring as little human intervention as possible.
When reviewing the needs of scientists, particularly those working at the boundaries of their fields, the American Association for the Advancement of Science (AAAS) discovered that scientists need to know the "who knows what ", the "how to's ", protocols, where to go for something, who someone is, and what is X? There are vocabulary problems, particularly at these boundary areas. The goal is to enhance community building across the more traditional disciplines. This requires acknowledging that the various communities have their own allegiances, home societies, and home journals. The key is to focus on solving particular problems.
The tools being developed to support Knowledge Environments include authority tools, many of which are database-driven rather than just textual. The goal is a technical platform with publishing standards and expert input. The "work environment " is Internet-based. While peer review is still important, there is a need to generalize information to a broader environment.
The key problem when trying to broaden the information environment is that too much information is retrieved. If a search is done against Science or other databases, the results are often overwhelming. While the papers are getting shorter, there are more of them. Secondary journals are proliferating particularly in boundary fields. The print is quickly out of date. However, turning solely to the Web, there is uneven quality, and some data types are not included on the Web at this time. The interface between science and policy requires gray literature, which is hard to locate and to assess when on the borders. There is a significant need for scientists working at the boundaries to regain confidence in the material they are using at these boundaries. It is too early to tell if the scientists' user behavior will actually change based on what is being proposed.
The specific project under development is STKE (Signal Transduction Knowledge Environment). It is a cooperative project with Highwire Press and Island Press. The prototype is funded by a grant from the Pew Foundation, which is interested in methods for allowing non-profits to stay competitive in a digital world. Pew is also interested in the intersection between science and policy. The current content is from Science and Highwire. Island Press will be doing a similar prototype in aquaculture.
During the pilot project, the partners encountered a variety of challenges. These included quality control issues. When is the technology good enough so that there is no need for human intervention? The results of the automatic filtering algorithms are currently being monitored by human beings. The second challenge involves human behavior. How quickly do they adapt to new features, and which features do they find to be most useful? Some early indications are that the users are using the new environment as if it were a print product. The users are looking at live links, but, instead of following the links, they print out the contents of the Web pages.
When does the content need to be comprehensive and when do you need editorial selection? It is important to make people's lives easier and help them to get information faster. What tools do you need to give to the expert contributors particularly since they don't get credit now from their community? How do you attract experts into a community? They don't have answers to these questions yet.
Ms. Bradford demonstrated some of the features of the STKE system. It includes "This Week in Signal Transduction ", written by editors. An algorithm goes across the journals and creates a virtual journal. About 7000 journals articles have been selected, to date. It is important to maintain the branding of the journal from the journals' homepage. This is very important to the publishers. An integrated search also goes across PubMed, several virtual journals, and other sources in order to cast the net more broadly. The results can be displayed using time stamp filtering or subject filtering. An e-mail alert can be created based on author, topic or journal title update. The title in the alert is a hot link right to the full text itself. The alerts can include jobs, events, white pages, etc. B any of the content that has been included. Reviews provided by expert contributors have versioning and updates. The authors agree to provide reviews for a minimum of one year.
Connection map technology is used. This departs from the traditional publication model, because the map is created "on the fly " based on a constantly updated database. The database is based on a search of the citation databases from which the best references are selected for inclusion, providing a basic reading list in the boundary topic.
Among the next steps for this project is settling on a business model. It is important to determine how such a service can be sustained. Is advertising a feasible option? Should a subscription model be used? There is interest on the part of pharmaceutical companies, but they are concerned about privacy and confidentiality. There is a conflict here because the companies would like to license the information, but AAAS' goal is to improve communication across audiences. Some technical changes are also being envisioned. Users should be able to name two or more components for a connection map, and see how the connections might be made through the literature. There will also be a way to add to the testbed database and see if the new information changes the connections. Folders will be available where the researcher can keep his or her own thought process. Science is working with SEMIO on a structure of relationships. New knowledge environments in aquaculture (spanning the policy, business and science), nanotechnology, geriatrics and AIDS, are also being considered.
Virtual Help Desks: Customer Service in a Networked Environment
Joanne Silverstein, Head of R&D, Information Institute of Syracuse University)
The Information Institute of Syracuse is an umbrella organization of Internet-related projects, including AskERIC and the Virtual Reference Desk. Much of its research is funded by the National Library of Education, including its research into Networked Customer Service. Ms. Silverstein's background is as a social scientist. Her interest is in how technologies impact human behavior.
In a networked environment, e-mail has become critical to the way that organizations work, particularly how they conduct customer service. The question arises, "How are organizations managing e-mail? " Ms. Silverstein interviewed executives at Internet start-up companies. She discovered that they are constraining the process because they are overwhelmed. There are major differences between the way for-profit and commercial organizations approach e-mail. For-profits are air right and seek to control the communications through registration and the use of forms. Not-for-profits are more porous, providing contact names, free form comment field, and anonymous input; it is possible to enter from many Web locations and at various levels within the Web site. Human intermediation in Web-based customer service is more common with not-for-profit organizations.
Customer service staff in both types of organizations are dealing with audiences that they never had to deal with before. While their main audiences may have stayed the same, they are crossing populations. This requires additional support in an exponential, not a geometric fashion. The human intermediation curve shows that it takes longer to answer questions that are peripheral rather than those that are core. People are scrambling to try to answer questions. This has a significant impact or organizations, including resource allocations, archiving, currency, privacy and language proficiency.
The National Library of Education funded a study to analyze electronic customer service. The goals of the study were to analyze current problems, processes and procedures in online customer service, to suggest solutions, to provide recommendations for policy, to outline software requirement for possible automation of processes, and to suggest training goals for the manager. The research team evaluated the use of e-mail within the Department of Education. Among the ed.gov Web pages, there were 17,000 mail to's. Of those with ed.gov e-mail addresses, only about 800 were found to be Department of Education staff. These staff were asked to identify their issues related to e-mail. About 200-300 unique issues were identified. Suggestions for improvement were also requested. With few exceptions, these issues could be the same for any organization. In most cases, the issues centered around the increasing number of customer requests from an ever more diverse audience, but with fewer resources dedicated to the customer support function.
It was obvious from the staff responses that they took the customer support function seriously and considered it important, because the web site of a government agency is often the first and only contact that the customer has with the agency. However, there appeared to be no comprehensive plan for handling e-mails within the Department of Education. Therefore, it was determined that a plan and a champion for the plan must be identified. It is also important to identify who the real customer is, and the level of centralization that will be most effective. Software requirements are less important than the policy. Technology is at the "end of the line " after you decide what you want to do.
Discussion
Ms. Silverstein asked for an update of what the agencies are doing in this area. Mr. Molholm indicated that DISA has had difficulties with online customer support. They are trying to find different ways to do help desks. The DISA senior leadership just had an off-site where Mr. Molholm was asked to speak about DTIC's Customer Advocacy Program. DISA views DTIC as a benchmark organization for customer service.
It was suggested that the root causes of off-track questions have been ignored. Ms. Silverstein noted that Internet start-up companies are often brokers and don't care about assessing quality.
NLM's analysis of an e-mail message found an appreciable percentage of duplicate questions -- questions being sent out to many people from the same person for the same issues. This is particularly true when back-door approaches are used by the requester. The lack of coherent policies and the numerous opinions that can be expressed by staff who respond to back-door requests can lead to bad information being provided to the requester.
Mr. Lawson asked if the customer support models change when agencies move from appropriated funds to cost recovery. Ms. Silverstein thought this was a good research question that had not been addressed.
Dr. Kahn expressed an interest in developing a reference
model for federated repositories that includes customer support aspects as well.