CENDI PRINCIPALS AND ALTERNATES MEETING

National Agricultural Library
Beltsville, MD
December 6, 2000

Minutes

New Policies, Systems, and Perspectives
PITAC: The Meeting of Content and Technology
FirstGov
New at NAL: The New AgNIC Architecture

WELCOME

Kurt Molholm, CENDI Chair, opened the meeting at 8:45 am. He thanked NAL for hosting the meeting.

NEW POLICIES, SYSTEMS, AND PERSPECTIVES

"PITAC: The Meeting of Content and Technology"
Dr. David Nagel, Chair, President’s Information Technology Advisory Committee [PITAC]
Working Group on Digital Libraries and President, AT&T Labs

Dr. Nagel gave a brief history of PITAC. It was formed in 1977, originally to deal with high performance computing. However, it has evolved to look more generally at the impact of the government’s technology efforts on the economy and society. The National Coordinating Office was established to work across agencies and to help PITAC to understand agency programs that could benefit by research initiatives.

In introducing some of the long-term thinking about computing, Dr. Nagel said that, although he is not sure precisely what the new Internet will be like, there would be increasing mobility. It will have more bandwidth, increased wireless connections and new types of applications. He indicated that processor technology is getting less energy intensive and batteries are becoming better, so that while wireless still has reliability problems, this is likely to improve and it will become increasingly pervasive. However, human factor issues are beginning to develop, as devices become lighter weight, and more portable. This is particularly problematic when trying to display web pages. Devices are problematic with the size of fingers people have. Voice technologies may serve to support this problem. There has been a 250-fold improvement in voice recognition in the last few years.

The U.S. has an astounding engine for creating economy out of basic research. Improvements will continue because of the basic scientific and technical curiosity (which government can support) and the drive of the market for constant innovation. Government basic science research is, therefore, extremely important.
The February 1999 PITAC report recommended significant increases in research funding on the order of $70-80B. Transformation challenges were identified; subsequently, these challenges are being addressed by individual committees and a series of more focused reports are being issued. One of these focused investigations is in the area of digital libraries.

The work of the Digital Libraries Committee is based on the deliberations of the group, which is made up of representatives of academia and industry. The Committee also held a series of briefings where they looked at what the government is doing and the R&D that is needed in relation to digital libraries. Presentations were invited from various experts. Dr. Warnick gave a presentation on the Physical Science Information Infrastructure as part of this review process.

The Committee has developed some preliminary findings and recommendations. Dr. Nagel indicated that he was anxious to get input from the CENDI members.

Despite the fact that the full potential of digital libraries has not yet been realized, the Internet and networking are having significant impacts on the productivity of individuals. He noted that Alan Greenspan, Chairman of the Federal Reserve Board, made this point in his analysis of economic productivity. The energy required to support the Gross National Product (GNP) is falling faster than the productivity is rising. This is the case despite the fact that only a small fraction of information is available online.

The government has exercised leadership in the development of digital library technologies and content. However, it could do a lot more. There is not as much content created and coordinated as would be beneficial. He estimated that there are about one billion pages on the Net. There are five to ten times more that could be. Further, there are enormous information stores that are not likely to be made available. The Digital Libraries Committee suggested that the government should require both online review and publication of documents that result from government-funded research. The government also has an important role to play with international digital libraries.

Dr. Nagel noted that we know a lot more about digital library technology than we do about how to really create and manage them. There are growing issues of institutional questions, such as resources.

The Committee noted the importance of dispelling the myth that electronic information is cheaper than paper. In this regard, they tried to look at the environment in a fresh way, both technically and institutionally. They made recommendations about budgets.

In addition to the general issues, the Committee identified several areas for technical and policy research. These are issues of intellectual property, privacy, preservation, retrieval and security and authentication.

The issue of intellectual property (IP) is beginning to seriously impact the ability to create and allow access to digital libraries. There are special challenges posed by new information and computer architectures. During the briefings, the Library of Congress (LC) specifically identified IP as a major issue. Dr. Nagel noted that they are not suggesting changes in law but in practice to ensure that government research is made more publicly available. The Committee recommends an evolving policy to deal fairly with intellectual property issues, including the development and deployment of an infrastructure to support the use of government material in digital libraries. This might include micropayments for certain types of government information for certain audiences in order to support the provision of government information to the public. The Committee called for a safe harbor for digital libraries that support research and scholarship. Practical fair use policies must be developed for managing ambiguous and unknown property rights.

A micro payment system requires methods to authenticate and verify government information, particularly as the information is re-used. Much of the technology for security and authentication is already being developed in the private sector. However, there are major issues surrounding the use of commercial products and concern about support for the product if the commercial sector decides the technology is not viable.

There are significant issues with regard to retrieval. We don’t know how use of digital libraries differs from that of traditional libraries. However, we know that the retrieval, to-date, has emphasized text objects, or textual surrogates for non-textual objects. The retrieval of non-textual objects in their own right is a major area for research. In addition, applications are needed for both creation of metadata and digital objects and for their discovery.

It is important to provide the necessary resources and policies to make federal information consistently available. The Committee has called for policies that encourage interagency exchange and cooperation in the area of digital libraries.

In addition to research activities related to digital libraries, the PITAC calls for a vision of what the digital library environment could be. The point in a vision is that some things might not be provable. However, they have a real impact. Dr. Nagel suggested that a viable federal model would be a repository of R&D results.

The key to success is to learn from actual digital library work. He would like to see some of the efforts move from computer science R&D organizations such as NSF and into libraries and agencies themselves. Overall, there is a need to establish large-scale test beds. One such effort that has been suggested is a digital library associated with crisis management, because it does not exist and is a major crosscutting activity.

Now that the main investigation of the Committee has concluded, Dr. Nagel has identified several topics for follow-on investigation. These include the issue of productivity in an electronic environment, energy intensity and networking, and seeking to abolish some of the myths about the cost of electronic publishing. There is also a need to discuss the impacts of government R&D. A major issue is the creation of online content in an affordable way. The issue of distributed versus centralized repositories of government information is critical. Dr. Nagel suggested that this is an area where CENDI could contribute.

Discussion

The CENDI members asked Dr. Nagel if CENDI could support the efforts of the committee by reviewing and commenting on the draft report. Dr. Nagel will discuss a possible CENDI review with the National Coordinating Office. The committee is planning to finish its work by the end of January and then issue the report soon thereafter. He suggested that special attention be paid to the specific language of the recommendations so that they are as pointed as possible.

Dr Nagel was asked about the degree to which digital libraries in other countries such as Australia, particularly those involving government information, were reviewed for possible government digital library models. He indicated that the discussion of digital libraries in other countries was limited to their role as complementary digital libraries with which a U.S. system would need to interact.

The CENDI members raised the issue of public/private sector competition. A few people on the committee panels raised this issue, but the Committee did not deal specifically with the commercial aspects of digital libraries.

Dr. Nagel was asked what impact a new Administration might have on the work of the PITAC and his committee in particular. He indicated that the Committee is working under the assumption that the work will continue after the change in Administration, but there is, of course, no guarantee.

"FirstGov"
Beverly Godwin and Meredith Lovell, FirstGov/NPR

Ms. Godwin, who is working with FirstGov from Vice President Gore’s National Partnership for Reinventing Government, described <firstgov.gov> as more than just a Web site for federal information. It has been viewed as a transforming mechanism for how the government deals with the public and as a catalyst for changing other government sites. The goal is to highlight what is being done in e-government. Through firstgov.gov, a citizen can buy stamps, file taxes, reserve a campsite, or check the quality of nursing homes. Many agencies have extremely worthwhile e-government programs, but they are not visible to the public. FirstGov also shows that the government can launch a system in Internet time -- the site was made available in 90 days. The portal was announced on June 24, 2000, and it went live on September 22, 2000.

The December 17, 1999, e-Government Memorandum from President Clinton included nine changes that needed to occur in order to make government more accessible to the public. One of them was transparency of government information, outside the organizational structure of government. This is what firstgov.gov seeks to do.

The architecture for firstgov.gov is a portal that includes links to portals along with .gov and .mil sites. The management structure of firstgov.gov includes interagency and private partnerships. Many of the portals that firstgov.gov links to have been developed by interagency groups and cut across federal information for certain audiences or topics -- for example, Seniors.gov and Students.gov. A Cross Agency Portals Working Group is being organized to collect knowledge and best practices and pass that information on to new groups beginning to work on portals for firstgov.gov.

The Federal Search Foundation was created to take ownership of the search engine and to operate it. The operation of the firstgov.gov site is done through a consortium of contractors. AT&T is the prime with seven subcontractors. There are currently six people (besides the contractor) working on the FirstGov team, a few of these people are detailees. The staff is expected to grow to 21.

The search engine is a two-year donation from Inktomi. It is worth approximately $10M. The engine is able to search almost .5B pages in less than a quarter of a second by searching the full text of every document rather than the metatags. Re-harvesting occurs every three days. It now provides access to approximately 27M pages, including all publicly available static .mil and .gov sites. Advanced searching is being developed at this point, and Ms. Godwin indicated that FirstGov welcomes comments from the agencies on how to improve the search engine.

FedSearch will allow federal use of the search engine. It could be used by individual agencies and would save them the expense of purchasing their own search engines.

Ms. Godwin demonstrated the various features of FirstGov. There are several ways to access the information including keyword, featured subjects that change monthly, by government organization, and by interesting topics. There are also partner pages, state and local information, and a feedback form.

Sixteen common categories have been developed to highlight interagency portals and gateways. One of these is Science and Technology. The structure is very shallow in that the categories quickly get the user to linked portals or content pages. They continue to look for cross agency groups that can focus the information for specific audiences or topics. One of the "holes" in the system is in the area of science and technology where there is no publicly available portal.

One of the areas of effort continues to be in the weighted algorithms that are employed. Since the search engine does not use metatags, there are examples of agencies with primary federal responsibility for an area (such as veterans’ benefits) but the agency does not show up at the top of the results list. White text is used to ensure that the ranking of these sites is improved. FirstGov is willing to work with any agency or organization for which the engine does not produce desired results. White text is required to automatically bring these sites higher in the relevance ranking.

The future of firstgov.gov was discussed. Funding for the next two years is being provided by the CIO Council. It is hoped FirstGov will become an appropriated program. An estimated $4M annually would be needed for continued development and support. Several people who have been on detail with FirstGov from other agencies will be moving over as employees of FirstGov, including Ms. Godwin and Ms. Lovell. FirstGov is applying for the Ford Foundation Innovations for the Government Award.

FirstGov is undergoing a major marketing campaign. However, the site is receiving a great deal of traffic already.

Discussion

Dr. Warnick pointed out that one of the advantages of FirstGov is that it didn’t burden creators or sites. It just harvested what was on the Net.

Ms. Godwin was asked about the reaction of commercial information providers to this type of service by the government. She indicated that FirstGov is willing to share what it has done with others. In fact, HiCitizen.com is a commercial site that has already taken firstgov.gov and improved on it. FirstGov would like to see commercial development of some portals.

The new A-130 OMB Guidelines reference GILS, as did the earlier guidelines. Ms. Godwin was asked about the connection between FirstGov and GILS. She indicated that while they are similar, the level of detail at which GILS is applied is not the same as firstgov.gov.

Ms. Godwin was asked about the assessment of the agencies that was levied in FY00 to support FirstGov. She indicated that this money is paying for hosting the site, for security software, promotions, etc. At the end of the two-year test period, the CIO Council will revisit this. The question was also asked about what will happen with the "donation" of the search engine after two years. Ms. Godwin responded that some type of request for proposals will go out and they will see what they get.

Dr. Wood asked if FirstGov has done anything in the area of metrics and evaluation of the site. Ms. Godwin indicated that only routine log statistics are produced and they are just beginning to discuss more detailed evaluation with the contractor.

Ms. Godwin proposed that CENDI develop the Science and Technology portal and improve on the categorization scheme that has been developed. Several members mentioned that most of the resources provided by the agencies are of a technical nature. A portal for researchers might also be considered.

Mr. Molholm noted that the Federal Library and Information Center Committee (FLICC) will be discussing FirstGov at its meeting on December 7, 2000. Ms. Tarr indicated that the objective of the discussion is to determine what role, if any, the federal libraries play in the further development of FirstGov.

"New at NAL: The New AgNIC Architecture"
Melanie Gardner, AgNIC Coordinator and Dr. John Kane, Computer Scientist
National Agricultural Library (NAL)

AgNIC (Agriculture Network Information Center) is a system in which NAL works collaboratively with others to organize agricultural resources. AgNIC acts as a portal for quality agricultural information on the Internet. The library is bringing the organizational expertise and resources that it has in physical library materials to the electronic environment. The goal is to share the burden of supplying information through partnership arrangements. There is no funding for AgNIC other than NAL’s support for the AgNIC Coordinator’s salary.

AgNIC is made up of distributed partners, but participation is not limited to government organizations. There are currently 38 partners, including non-profits, professional organizations, foreign academic organizations, land grant universities and colleges, and other government organizations. Commercial organizations have expressed an interest, but AgNIC is choosing not to go that route at this time. The plan is to develop an international consortium in which AgNIC is part of a global portal.

The relationships between partners are still evolving. They want each institution to actually create information in specific topic areas and to participate in the technology development. The partners must provide online reference, selected, quality resources with a review committee, and a useful calendar of related events. Questions that are not covered by the partners are dealt with by NAL. Guidelines and a partnership agreement have been developed that are currently undergoing revision. The current version is available from the AgNIC Web site.

AgNIC is based on the "centers of excellence" model, with each center taking on the development of content in a particular topic area. Only public domain information can be provided. In addition to full text documents, there are conference papers, data sets, expertise directories, and calendars.

AgNIC continues to refine the selection process. They are planning to link from the archive of papers to the proceedings. The calendars are being moved to a database format.

Ms. Gardner noted that communication is the most difficult aspect of AgNIC. Keeping up the group spirit with only one face-to-face meeting per year is very difficult. They have a variety of task groups and listservs that support discussions and development.

Dr. Kane described this as the "third phase of AgNIC development." This phase involves the development of additional standards and protocols. The goal is a technology that supports dynamic automation, distributed systems, and well-structured systems. AgNIC uses Dublin Core as the basis for its metadata. AgNIC also uses the ROADS (Resource Organisation and Discovery in Subject-based Services) software and architecture for distributed systems. The metadata format for AgNIC is registered with the ROADS system. The quality continues to come from the intellectual decisions that are made.

The ROADS software includes the concept of centroids that are simply inverted indices that can be shared. The Whois++ software handles the management of the system. Additional development is done using Zope Web Development environment, UML, and OO. An input form for metadata has been developed.

AgNIC is applying for NSF funding for architecture development. They also want to get further on content development, in addition to hardware/software. Another current project includes development of a controlled vocabulary in collaboration with Cornell University and other partners. The thesaurus is seen as dynamic and utilitarian. The vocabulary is being used to give a road map of what agriculture is.

Back to Minutes Archive