| CENDI PRINCIPALS AND ALTERNATES MEETING |
National Library of Education
Washington, D.C.
December 4, 2001
WELCOME
Kent Smith opened the meeting at 9:15 am. He thanked the National Library of Education for hosting the meeting. Introductions were made.
"The Semantic Web"
Dr. James Hendler, University of Maryland
Dr. Hendler's work on the concept of the semantic web began when he was Chief Scientist of Information Systems and a Project Manager with Defense Advanced Research Projects Agency (DARPA). He worked with Tim Berners-Lee and others to develop advanced ideas. Dr. Hendler is now working with a new laboratory at the University of Maryland, the Maryland Information and Network Dynamics Laboratory, that has been developed to bridge the gap between R&D and practice - to take those advanced ideas into an environment where they can be shown to vendors, who then take the ideas and make them operational.
The term "semantic web" was coined by Berners-Lee in the mid 1990s and popularized by a 1999 Scientific American article. This was really a vision paper describing a seamless operation. T he status of the semantic web is appropriate to the R&D transfer laboratory. How do we get from the concept to an operational system?
The goal behind the semantic web is interoperability -- making heterogeneous resources, such as databases and financial systems, "talk to one another". Dr. Hendler gave examples from various agencies where multiple systems, primarily legacy databases, needed to share data. In one agency, many of the systems required special code to get information from one system to another. About 30 to 40 percent of all code was written to perform data transfer. One program had over 2000 interfaces to make this bridge. This approach to integration translates into lost time, high cost, and impacted missions.
The lack of interoperability across systems also results in "personal information disaster." We have an increasing amount of data in our lives, much of which is available on the Internet. However, the data is not "interneted" together. Even though more information is available on personal devices, the answers a person seeks still require accessing multiple systems, without the ability to take content from one and use it with another. The content on the web is really pre-web, and it has not been redesigned to take advantage of the web's ability to link and integrate content.
In order to achieve the integration that is needed both organizationally and personally, we need to start telling the web what the pieces of information mean. In some cases, the meaning is not explicit, which is why keyword searching alone will not work. Intellectual effort is needed.
XML and RDF are starting to make it possible to embed meaning. XML allows for meaningful tags; RDF provides structure. Tools are beginning to be deployed and both standards should be fully useful in four to six years.
However, the problem remains that, to the computer, these tags are the same as any other ASCII computer code. Unless there is a common understanding of the meaning of the XML tags, there is no way to integrate information from various documents. Schemas help because they can bridge across the mark-up of individual documents.
Unfortunately, there are multiple schemas for the same type of information and users do not agree. Semantic web languages go beyond a single schema by adding mappings and structure to multiple schemas. The concept of merging schemas was built into the semantic web language because it is not tree-based but graph-based. Through the use of semantic web languages, it would be possible to develop standard schemas for different resource types, such as calendars. By mapping proprietary formats to a standard community-defined schema, individual calendar programs developers could disagree on the individual schemas implemented in their products, but the contents could be migrated from system to system and could talk to one another through the common standard. Dr. Hendler cautioned that the XML mark-up done to-date may need to be retrofitted to work in the new environment.
Dr. Hendler presented a map of the development of the web. Major changes occur on approximately 10-year intervals. Information retrieval has to progress, because it will take major breakthroughs just to keep up with the volume of growth of the web. Without major changes in the way that information is organized and retrieved, it becomes harder to maintain the same level of performance over time. Eventually, Dr. Hendler envisions information retrieval systems that will recommend and help the user to perform activities once the information is found. Projected on this path of 10-year intervals, a knowledge web would be achievable in approximately 2010.
However, the method for providing these enhancements to the way information is retrieved from the web is better mark-up. It becomes a question of the perceived value. The task of mark-up can be supported by technology, but the technology must be developed and purchased. There is also a question of who puts forth the effort and where? Is the mark-up the responsibility of the author/originator of the material or of the webmaster? Dr. Hendler's activities have found that it is often better to provide the documents to someone who knows more about the collection as a whole and who is trained to deal with mark-up than to expect authors to understand the broader picture.
In the area of vocabularies, Dr. Hendler believes that no one will ever agree
on a single technical vocabulary. However, it is more important to have imperfect
communication than no communication at all. A distributed ontological representation
is more appropriate. Small communities would define common semantics for the
particular community, and larger communities would form around a smaller number
of terms that would be shared between/among them. The larger communities will
build synonym lists that bridge across the vocabularies.
The current semantic web effort is being funded by both the US and the European
Union ($20,000,000 and 20,000,000 Euros respectively). Both are working closely
with the W3C's Semantic Web Activity. There is also significant interest from
other nations. Many small companies are involved, because they see this as a
growing niche that they can fill. There is growing user pull as they seek to
deal with larger amounts of information. However, there are significant political
issues for large organizations such as Microsoft.
CENDI agencies should have their staff who develop information management tools keep abreast of semantic web activities. There is a newsletter available at the DARPA Advanced Mark-up Language web site at www.daml.org. Approximately 170 ontologies have been submitted for free use, along with free mark-up tools. The W3C working group is a combination of those who have worked on the US and European ontology language, DAML+ OIL, which has been released. This work is serving as a starting place for the group that Dr. Hendler is chairing. While the various initiatives have focused on different aspects, including the military or e-business, there are commonalities. This W3C working group has significant membership from among the W3C organizations (47 members from the 300 W3C members), which is an indication of the importance with which it is viewed.
DARPA has invested in the semantic web and wants to make sure that the technology gets into end user tools. For example, clip art could come with mark-up tags that could be preserved when the art is pasted into a document or PowerPoint presentation. This would allow the semantics (what is the picture about) to be available in the document in which the clip art is used.
Ultimately, there is a need to go beyond text in dealing with the semantic
web concepts. The semantic web language could be extended to transaction-based
applications.
"KM in a Systematic Way: Automatic Categorization/Knowledge Management/Portal
Technologies"
Denise Duncan, Logistics Management Institute [LMI]
Ms. Duncan described the tools and rules of thumb that LMI has developed to implement knowledge management technologies. These technologies are high-ticket items, and it is important for the organization to get value for the investment. Therefore, LMI developed a structured approach for knowledge management (KM) projects. There has been a revolution in vendor capabilities almost every six months. Thus, it is necessary to revisit the state of the art periodically. The structured approach is helpful in performing technology analyses as well.
The KM implementation strategy begins with the CIO view (which is view from 30,000 feet). The first goal is to ensure that the organization has common definitions, visions, and expectations of KM. While the definitions of KM may differ among organizations, they should be the same within an organization.
The components of the framework include a Maturity Matrix, a Selection Decision Table, Software Implementation Methodology, and the use of a Prototype Lab and Vendor evaluations.
During the planning phase, the Maturity Matrix is used. This is useful both as a diagnostic tool and as an educational tool. The legacy technologies are reviewed along with content. A major part of this review is an analysis of the existing culture. In many organizations, people are rewarded for their unique knowledge. To have a true KM culture, people must be rewarded for sharing knowledge. The closer that the organization is to a KM culture, the more successful will be the implementation.
The Selection Decision Table maps the maturity matrix to the initiatives that could be supported by a KM project. It is important when selecting the first KM project to select one that is likely to succeed and that has an identifiable champion and the visibility to promote the next initiative. The best initiative will impact the quality of work life for workforce as well as the bottom line for management; it must please a high-level person and help the workforce. If a champion cannot be found, Ms. Duncan suggests that the next best initiative be selected where a champion is available. The champion must also be at the right level within the organization where he/she can influence peers and the workforce.
Once the initiative is selected, the Software Implementation Development is used to select the best software to be used for the project. Not all software provides functionality and it is important to understand the "drivers" for the software and to align them with the needs of the project. In addition, it is important to take a broader look, to ensure that the software is applicable for more than just the first initiative.
LMI has developed a prototyping lab. This allows LMI and its customers to try various software without the expense of the purchase or of the installation. LMI has made arrangements to partner with several vendors. It is a limited partnership that allows LMI to maintain its independence in recommending solutions to its customers.
"Agencies' Impacts and Response to the War on Terrorism"
Roundtable Discussion
CENDI agencies participated in a roundtable discussion, sharing how their organizations are coping with, or have been impacted by, the events of September 11. Many of the agencies were experiencing increased physical security at their facilities. Most, if not all, were reviewing their Web sites to determine whether information they were providing needed to be removed or revised. A number of agency sites had already been taken down in quick reaction to the terrorist threat. Many were now given the responsibility to more systematically review criteria for the availability of sites. In contrast, several agencies were looking at providing more or different types of information specifically targeted at allaying public concerns, like information for the treatment of anthrax. At least one agency was reviewing whether registration processes for secure sites needed to be refined. Some found a need to increase bandwidth or other resources because of a dramatic increase in the number of inquiries coming from the public.