|
|
National
Agricultural Library
Beltsville, MD
December 6, 2000
WELCOME
Kurt Molholm, CENDI Chair, opened the meeting at 8:45 am. He thanked NAL for hosting the meeting.
"PITAC: The Meeting of Content and Technology"
Dr. David Nagel, Chair, President’s Information Technology Advisory Committee
[PITAC]
Working Group on Digital Libraries and President, AT&T Labs
Dr. Nagel gave a brief history of PITAC. It was formed in 1977, originally
to deal with high performance computing. However, it has evolved to look more
generally at the impact of the government’s technology efforts on the economy
and society. The National Coordinating Office was established to work across
agencies and to help PITAC to understand agency programs that could benefit
by research initiatives.
In introducing some of the long-term thinking about computing, Dr. Nagel said
that, although he is not sure precisely what the new Internet will be like,
there would be increasing mobility. It will have more bandwidth, increased wireless
connections and new types of applications. He indicated that processor technology
is getting less energy intensive and batteries are becoming better, so that
while wireless still has reliability problems, this is likely to improve and
it will become increasingly pervasive. However, human factor issues are beginning
to develop, as devices become lighter weight, and more portable. This is particularly
problematic when trying to display web pages. Devices are problematic with the
size of fingers people have. Voice technologies may serve to support this problem.
There has been a 250-fold improvement in voice recognition in the last few years.
The U.S. has an astounding engine for creating economy out of basic research.
Improvements will continue because of the basic scientific and technical curiosity
(which government can support) and the drive of the market for constant innovation.
Government basic science research is, therefore, extremely important.
The February 1999 PITAC report recommended significant increases in research
funding on the order of $70-80B. Transformation challenges were identified;
subsequently, these challenges are being addressed by individual committees
and a series of more focused reports are being issued. One of these focused
investigations is in the area of digital libraries.
The work of the Digital Libraries Committee is based on the deliberations of
the group, which is made up of representatives of academia and industry. The
Committee also held a series of briefings where they looked at what the government
is doing and the R&D that is needed in relation to digital libraries. Presentations
were invited from various experts. Dr. Warnick gave a presentation on the Physical
Science Information Infrastructure as part of this review process.
The Committee has developed some preliminary findings and recommendations.
Dr. Nagel indicated that he was anxious to get input from the CENDI members.
Despite the fact that the full potential of digital libraries has not yet been
realized, the Internet and networking are having significant impacts on the
productivity of individuals. He noted that Alan Greenspan, Chairman of the Federal
Reserve Board, made this point in his analysis of economic productivity. The
energy required to support the Gross National Product (GNP) is falling faster
than the productivity is rising. This is the case despite the fact that only
a small fraction of information is available online.
The government has exercised leadership in the development of digital library
technologies and content. However, it could do a lot more. There is not as much
content created and coordinated as would be beneficial. He estimated that there
are about one billion pages on the Net. There are five to ten times more that
could be. Further, there are enormous information stores that are not likely
to be made available. The Digital Libraries Committee suggested that the government
should require both online review and publication of documents that result from
government-funded research. The government also has an important role to play
with international digital libraries.
Dr. Nagel noted that we know a lot more about digital library technology than
we do about how to really create and manage them. There are growing issues of
institutional questions, such as resources.
The Committee noted the importance of dispelling the myth that electronic information
is cheaper than paper. In this regard, they tried to look at the environment
in a fresh way, both technically and institutionally. They made recommendations
about budgets.
In addition to the general issues, the Committee identified several areas for
technical and policy research. These are issues of intellectual property, privacy,
preservation, retrieval and security and authentication.
The issue of intellectual property (IP) is beginning to seriously impact the
ability to create and allow access to digital libraries. There are special challenges
posed by new information and computer architectures. During the briefings, the
Library of Congress (LC) specifically identified IP as a major issue. Dr. Nagel
noted that they are not suggesting changes in law but in practice to ensure
that government research is made more publicly available. The Committee recommends
an evolving policy to deal fairly with intellectual property issues, including
the development and deployment of an infrastructure to support the use of government
material in digital libraries. This might include micropayments for certain
types of government information for certain audiences in order to support the
provision of government information to the public. The Committee called for
a safe harbor for digital libraries that support research and scholarship. Practical
fair use policies must be developed for managing ambiguous and unknown property
rights.
A micro payment system requires methods to authenticate and verify government
information, particularly as the information is re-used. Much of the technology
for security and authentication is already being developed in the private sector.
However, there are major issues surrounding the use of commercial products and
concern about support for the product if the commercial sector decides the technology
is not viable.
There are significant issues with regard to retrieval. We don’t know how use
of digital libraries differs from that of traditional libraries. However, we
know that the retrieval, to-date, has emphasized text objects, or textual surrogates
for non-textual objects. The retrieval of non-textual objects in their own right
is a major area for research. In addition, applications are needed for both
creation of metadata and digital objects and for their discovery.
It is important to provide the necessary resources and policies to make federal
information consistently available. The Committee has called for policies that
encourage interagency exchange and cooperation in the area of digital libraries.
In addition to research activities related to digital libraries, the PITAC
calls for a vision of what the digital library environment could be. The point
in a vision is that some things might not be provable. However, they have a
real impact. Dr. Nagel suggested that a viable federal model would be a repository
of R&D results.
The key to success is to learn from actual digital library work. He would like
to see some of the efforts move from computer science R&D organizations
such as NSF and into libraries and agencies themselves. Overall, there is a
need to establish large-scale test beds. One such effort that has been suggested
is a digital library associated with crisis management, because it does not
exist and is a major crosscutting activity.
Now that the main investigation of the Committee has concluded, Dr. Nagel has
identified several topics for follow-on investigation. These include the issue
of productivity in an electronic environment, energy intensity and networking,
and seeking to abolish some of the myths about the cost of electronic publishing.
There is also a need to discuss the impacts of government R&D. A major issue
is the creation of online content in an affordable way. The issue of distributed
versus centralized repositories of government information is critical. Dr. Nagel
suggested that this is an area where CENDI could contribute.
Discussion
The CENDI members asked Dr. Nagel if CENDI could support the efforts of the
committee by reviewing and commenting on the draft report. Dr. Nagel will discuss
a possible CENDI review with the National Coordinating Office. The committee
is planning to finish its work by the end of January and then issue the report
soon thereafter. He suggested that special attention be paid to the specific
language of the recommendations so that they are as pointed as possible.
Dr Nagel was asked about the degree to which digital libraries in other countries
such as Australia, particularly those involving government information, were
reviewed for possible government digital library models. He indicated that the
discussion of digital libraries in other countries was limited to their role
as complementary digital libraries with which a U.S. system would need to interact.
The CENDI members raised the issue of public/private sector competition. A
few people on the committee panels raised this issue, but the Committee did
not deal specifically with the commercial aspects of digital libraries.
Dr. Nagel was asked what impact a new Administration might have on the work of the PITAC and his committee in particular. He indicated that the Committee is working under the assumption that the work will continue after the change in Administration, but there is, of course, no guarantee.
"FirstGov"
Beverly Godwin and Meredith Lovell, FirstGov/NPR
Ms. Godwin, who is working with FirstGov from Vice President Gore’s National
Partnership for Reinventing Government, described <firstgov.gov> as more
than just a Web site for federal information. It has been viewed as a transforming
mechanism for how the government deals with the public and as a catalyst for
changing other government sites. The goal is to highlight what is being done
in e-government. Through firstgov.gov, a citizen can buy stamps, file taxes,
reserve a campsite, or check the quality of nursing homes. Many agencies have
extremely worthwhile e-government programs, but they are not visible to the
public. FirstGov also shows that the government can launch a system in Internet
time -- the site was made available in 90 days. The portal was announced on
June 24, 2000, and it went live on September 22, 2000.
The December 17, 1999, e-Government Memorandum from President Clinton included
nine changes that needed to occur in order to make government more accessible
to the public. One of them was transparency of government information, outside
the organizational structure of government. This is what firstgov.gov seeks
to do.
The architecture for firstgov.gov is a portal that includes links to portals
along with .gov and .mil sites. The management structure of firstgov.gov includes
interagency and private partnerships. Many of the portals that firstgov.gov
links to have been developed by interagency groups and cut across federal information
for certain audiences or topics -- for example, Seniors.gov and Students.gov.
A Cross Agency Portals Working Group is being organized to collect knowledge
and best practices and pass that information on to new groups beginning to work
on portals for firstgov.gov.
The Federal Search Foundation was created to take ownership of the search engine
and to operate it. The operation of the firstgov.gov site is done through a
consortium of contractors. AT&T is the prime with seven subcontractors.
There are currently six people (besides the contractor) working on the FirstGov
team, a few of these people are detailees. The staff is expected to grow to
21.
The search engine is a two-year donation from Inktomi. It is worth approximately
$10M. The engine is able to search almost .5B pages in less than a quarter of
a second by searching the full text of every document rather than the metatags.
Re-harvesting occurs every three days. It now provides access to approximately
27M pages, including all publicly available static .mil and .gov sites. Advanced
searching is being developed at this point, and Ms. Godwin indicated that FirstGov
welcomes comments from the agencies on how to improve the search engine.
FedSearch will allow federal use of the search engine. It could be used by
individual agencies and would save them the expense of purchasing their own
search engines.
Ms. Godwin demonstrated the various features of FirstGov. There are several
ways to access the information including keyword, featured subjects that change
monthly, by government organization, and by interesting topics. There are also
partner pages, state and local information, and a feedback form.
Sixteen common categories have been developed to highlight interagency portals
and gateways. One of these is Science and Technology. The structure is very
shallow in that the categories quickly get the user to linked portals or content
pages. They continue to look for cross agency groups that can focus the information
for specific audiences or topics. One of the "holes" in the system
is in the area of science and technology where there is no publicly available
portal.
One of the areas of effort continues to be in the weighted algorithms that
are employed. Since the search engine does not use metatags, there are examples
of agencies with primary federal responsibility for an area (such as veterans’
benefits) but the agency does not show up at the top of the results list. White
text is used to ensure that the ranking of these sites is improved. FirstGov
is willing to work with any agency or organization for which the engine does
not produce desired results. White text is required to automatically bring these
sites higher in the relevance ranking.
The future of firstgov.gov was discussed. Funding for the next two years is
being provided by the CIO Council. It is hoped FirstGov will become an appropriated
program. An estimated $4M annually would be needed for continued development
and support. Several people who have been on detail with FirstGov from other
agencies will be moving over as employees of FirstGov, including Ms. Godwin
and Ms. Lovell. FirstGov is applying for the Ford Foundation Innovations for
the Government Award.
FirstGov is undergoing a major marketing campaign. However, the site is receiving
a great deal of traffic already.
Discussion
Dr. Warnick pointed out that one of the advantages of FirstGov is that it didn’t
burden creators or sites. It just harvested what was on the Net.
Ms. Godwin was asked about the reaction of commercial information providers
to this type of service by the government. She indicated that FirstGov is willing
to share what it has done with others. In fact, HiCitizen.com is a commercial
site that has already taken firstgov.gov and improved on it. FirstGov would
like to see commercial development of some portals.
The new A-130 OMB Guidelines reference GILS, as did the earlier guidelines.
Ms. Godwin was asked about the connection between FirstGov and GILS. She indicated
that while they are similar, the level of detail at which GILS is applied is
not the same as firstgov.gov.
Ms. Godwin was asked about the assessment of the agencies that was levied in
FY00 to support FirstGov. She indicated that this money is paying for hosting
the site, for security software, promotions, etc. At the end of the two-year
test period, the CIO Council will revisit this. The question was also asked
about what will happen with the "donation" of the search engine after
two years. Ms. Godwin responded that some type of request for proposals will
go out and they will see what they get.
Dr. Wood asked if FirstGov has done anything in the area of metrics and evaluation
of the site. Ms. Godwin indicated that only routine log statistics are produced
and they are just beginning to discuss more detailed evaluation with the contractor.
Ms. Godwin proposed that CENDI develop the Science and Technology portal and
improve on the categorization scheme that has been developed. Several members
mentioned that most of the resources provided by the agencies are of a technical
nature. A portal for researchers might also be considered.
Mr. Molholm noted that the Federal Library and Information Center Committee (FLICC) will be discussing FirstGov at its meeting on December 7, 2000. Ms. Tarr indicated that the objective of the discussion is to determine what role, if any, the federal libraries play in the further development of FirstGov.
"New at NAL: The New AgNIC Architecture"
Melanie Gardner, AgNIC Coordinator and Dr. John Kane, Computer Scientist
National Agricultural Library (NAL)
AgNIC (Agriculture Network
Information Center) is a system in which NAL works collaboratively with others
to organize agricultural resources. AgNIC acts as a portal for quality agricultural
information on the Internet. The library is bringing the organizational expertise
and resources that it has in physical library materials to the electronic environment.
The goal is to share the burden of supplying information through partnership
arrangements. There is no funding for AgNIC other than NAL’s support for the
AgNIC Coordinator’s salary.
AgNIC is made up of distributed
partners, but participation is not limited to government organizations. There
are currently 38 partners, including non-profits, professional organizations,
foreign academic organizations, land grant universities and colleges, and other
government organizations. Commercial organizations have expressed an interest,
but AgNIC is choosing not to go that route at this time. The plan is to develop
an international consortium in which AgNIC is part of a global portal.
The relationships between
partners are still evolving. They want each institution to actually create information
in specific topic areas and to participate in the technology development. The
partners must provide online reference, selected, quality resources with a review
committee, and a useful calendar of related events. Questions that are not covered
by the partners are dealt with by NAL. Guidelines and a partnership agreement
have been developed that are currently undergoing revision. The current version
is available from the AgNIC Web site.
AgNIC is based on the "centers
of excellence" model, with each center taking on the development of content
in a particular topic area. Only public domain information can be provided.
In addition to full text documents, there are conference papers, data sets,
expertise directories, and calendars.
AgNIC continues to refine
the selection process. They are planning to link from the archive of papers
to the proceedings. The calendars are being moved to a database format.
Ms. Gardner noted that communication
is the most difficult aspect of AgNIC. Keeping up the group spirit with only
one face-to-face meeting per year is very difficult. They have a variety of
task groups and listservs that support discussions and development.
Dr. Kane described this
as the "third phase of AgNIC development." This phase involves the
development of additional standards and protocols. The goal is a technology
that supports dynamic automation, distributed systems, and well-structured systems.
AgNIC uses Dublin Core as the basis for its metadata. AgNIC also uses the ROADS
(Resource Organisation and Discovery in Subject-based Services) software and
architecture for distributed systems. The metadata format for AgNIC is registered
with the ROADS system. The quality continues to come from the intellectual decisions
that are made.
The ROADS software includes
the concept of centroids that are simply inverted indices that can be shared.
The Whois++ software handles the management of the system. Additional development
is done using Zope Web Development environment, UML, and OO. An input form for
metadata has been developed.
AgNIC is applying for NSF
funding for architecture development. They also want to get further on content
development, in addition to hardware/software. Another current project includes
development of a controlled vocabulary in collaboration with Cornell University
and other partners. The thesaurus is seen as dynamic and utilitarian. The vocabulary
is being used to give a road map of what agriculture is.