CENDI ANNUAL PLANNING MEETING

Bolger Center in Potomac , MD
August 11-12, 2004

Abbreviated Minutes

COLLECTIVE ACTION FOR MUTUAL BENEFIT:
THE POWER OF COOPERATION

 

Chairman’s Welcome – Dr. Walter Warnick

The information age has changed over the years and we are indebted to leaders such as Kent Smith, Kurt Molholm, and our predecessors for moving us out of the “dark ages.” CENDI is in a state of transition and we need to keep the momentum going. The agencies represented by members of CENDI now represent approximately 96 percent of the federal R&D budget. The opportunities for collaboration are unprecedented.

However, the challenges are also unprecedented. Team work will be needed; we must search for the best opportunities, and then apply our best thought and effort to capitalize on them. CENDI can help bridge the gap between research and deployment of research, seizing individual opportunities, and looking for opportunities that we can meet collectively. Each member must “have a grip on his agency’s own big ideas” to define our collective vision for the future.

Part of CENDI’s vision is that there is strength is numbers. With Science.gov, CENDI has tangible evidence of true interagency collaborative success. The Alliance for Science.gov, which includes the CENDI members plus a few non-CENDI agencies, has major plans to expand the scope of Science.gov into new content types and services for new audiences, and to create a precision search tool. CENDI and Science.gov are part of the grand vision toward a reverence for knowledge. The key to grand achievement is grand collaboration.

CENDI will miss the principals and alternates who have been at the forefront of the CENDI team. However, by continuing in this spirit, CENDI can continue to take collective action for mutual benefit.

To further set the stage for the meeting, the group viewed a video produced by NLM on the occasion of the anniversary of Dr. Donald Lindberg, Director of NLM. There were several take-home messages from this video:

Big ideas don’t always come from internal sources – outside advice is very important as part of the planning process. This is what led to National Center for biotechnology Information (NCBI) and NLM’s outreach programs.

KEYNOTE: “From Information Research to Information Management: Opportunities for Cooperation”, Dr. Michael Pazzani, Director, Division of Information and Intelligent Systems, National Science Foundation

Dr. Pazzani described his organization and its placement within NSF; gave examples of programs within the division that would be of interest to CENDI members, including new initiatives; described NSF’s work with other agencies; and discussed issues surrounding technology transfer and the movement of research results into the operations of government agencies.

NSF’s mission is to promote the progress of science to advance the national health, prosperity and welfare and secure the national defense. The Directorate for Computer and Information Sciences and Engineering (CISE) reports to the Office of the NSF Director. There are four divisions under CISE, which recently reorganized. The Division of Computing and Communication Foundations is involved in foundational research. The Division of Computer and Network Systems is involved in how to build larger systems. The Division of Shared Cyberinfrastructure is concerned with the infrastructure needed for science and by the nation in general; e.g., the Grid. The Division of Information and Intelligent Systems is involved with systems that enable more intelligent applications. Budgets are likely to be flat or slow growth for the next few years so it is unlikely that NSF will start many new programs. However, monies can be redirected.

The Division of Information and Intelligent Systems is particularly interested in increasing the capabilities of human beings and machines to discover and reason about knowledge. This involves the ability to represent, collect, store, organize, locate, visualize, and communicate information. The research leads to discovery in the sciences and engineering. There are different clusters within the division including Computer Vision and Robotics, Information and Data Management, and Digital Libraries. This research and education is then set in the context of particular programs in human-computer interaction, universal access, digital society and technologies, and digital government. The latter explores how agencies can use technology.

Science needs to connect information so Information Integration, which is a relatively new initiative, focuses on developing information technology to solve particular science or engineering problems, either a scientific challenge, or a particular domain challenge. The goal is to identify breakthroughs in science that can be supported by computer science. This includes research into reconciling heterogeneous formats, web semantics, decentralized data sharing, and on-the-fly integration.

The Digital Government program is a move toward a more applied program. The goal is to support computer and information science research applied to government missions, in partnership with government agencies. This program supports multidisciplinary research on the design, use, and impact of IT on government institutions, in particular the interaction between agencies and citizens. Areas of research include information and knowledge management, including the collection, discovery, access, dissemination, archiving, and preservation of the information; human computer interaction; information systems security, privacy and trust; models and simulations for decision making; and understanding the effects of IT on agencies and governance. This work is done within the government domains of federal statistics, public safety and law enforcement, crisis management, environmental policy and management, e-voting, e-rulemaking, security and long-term archiving and preservation of digital information.

Digital government grants are given to universities. NSF may also support workshops to bring people from agencies and academia together to discuss problems and develop research agendas. An example of multi-agency collaboration is the GeoCollaborative Crisis Management project at Penn State University. Agency collaborators include the U.S. Geological Survey, NASA, the Pennsylvania Department of Environmental Protection, and the Florida Emergency Management Agency. International partnerships are also possible (e.g., the E-Challenges for Government and Society published in October 2003 with the European Union). NSF can also facilitate getting Memoranda of Agreement (MOUs) in place quickly. For example, the New York State Public Health Community was able to respond quickly to the 1999 West Nile outbreak. The work negotiated protocols to enable data sharing. October 19, 2004, is Digital Government Day at NSF. Agencies are invited to hear about the work of the program and meet the collaborators.

At times, agencies share costs with NSF. Often the agencies bring real world problems, data, or access to people’s time at the agencies. The budget for Digital Government is approximately $9 million, with another $9 million from other areas within NSF such as Information Technology Research. The intelligence agencies have brought another $20 million to Information and Intelligent Systems in support of near-term mission requirements.

An example of a Digital Library Project is the computational tools for modeling, visualizing, and analyzing historic and archaeological sites funded at Columbia University. The Cathedral of St. John the Divine in New York City and the Cathedral Saint Pierre in Beauvais, France, were scanned and modeled. This work will help to examine weaknesses in the building and propose remedies, establish the baseline condition, and visualize the building in previous contexts. The tools become a way to collaboratively teach about historic sites in the classroom and on the Internet.

The Digital Society and Technologies area researches four interrelated areas – distributed and collective action; the digital economy and the information society; the design, use and implications of IT; and computational organization, society and economic sciences. The last two areas apply to all organizations, including the government. This effort will help to identify the conditions under which collaborations work well and support the building of models of how organizations use computers and technologies.

A two-year study on collaboration concluded that distance matters. It is difficult to establish mutual regard and common ground unless there is direct interaction among members of a project team. Outcomes depend on coordination and distance is a more important obstacle then the multidisciplinary nature of the investigation. In addition, the number of universities involved in a project was negatively correlated to the outcomes, unless the Principal Investigators (PIs) were attentive to coordination mechanisms such as monthly phone calls, face to face meetings, conferences and workshops, shared faculty or student involvement.

Other Digital Society and Technologies research has found that the cycle of IT design, use, consequences, and feedback must be studied as a whole. The big picture is important along with the context. A number of schools are expanding their programs to include the context of computer science in the curriculum. The integration of computer science and information science programs may be one instance of this. Approaches that broaden the context also seem to attract more diverse student bodies, including more women and minorities.

Individuals or small teams looking at parts of a bigger problem are optimal for information integration, but this requires high level coordination. In addition, there is a need to communicate findings through publications, publishing of data, hypotheses, etc. A uniform view to a multitude of heterogeneous, independently developed data sources is also important. The goal is to free users from having to locate the sources, interact with each one in isolation, and manually combine and manipulate the data. Topics to be addressed include reconciling heterogeneous formats, schemas and ontologies; web semantics; decentralized data sharing; data sharing on advanced cyberinfrastructures such as the grid; and on-the-fly integration. They are about to make the first awards under this program before September 30, 2004.

NSF funding in national and homeland security includes foundational research in areas such as cryptography, network security, machine translation, robotics, data mining, critical infrastructure protection, etc. Supplemental funding is often provided by other agencies to existing NSF investigators to perform unclassified research with short-term application of interest to the agency. For example, supplemental grants can be provided to original grantees into machine translation for other languages to be developed.

The NSF Agency Collaboration Model includes partnerships such as that between the Library of Congress and others in the investigation of digital archiving and long-term preservation. A number of workshops were held with NSF funding. An MOU between LC and NSF awards the grants using the NSF review process.

The issue of technology transfer, of course, arises when dealing with a grant situation. The Bayh-Dole Act allows for the transfer of exclusive control over many government-funded inventions to universities and businesses operating with federal contracts for the purpose of further development and ultimate commercialization. There are numerous Small Business Innovation Research (SBIR) programs and venture capital approaches such as In-Q Tel. The Digital Government approach of prototypes provides some level of system development and better specifications for what is ultimately needed. Open source is another transfer mechanism, but this may have better results when there is a large community that can do robust and speedy incremental development. In some cases, the source code is transferred to the government where federal or contractor staff can further the work of the grantees.

To improve dissemination of NSF research results, NSF is considering a number of alternatives including making annual reports from grantees publicly available in addition to abstracts and award announcements. Dr. Pazzani and others at NSF are very supportive of this sort of action, but there are cultural issues and concerns raised about putting the research out too early. Many CENDI agencies have experience that could help NSF if they decide to develop such a repository.

CENDI Roundtable Discussion on Challenges and Ideas for the Future of STI

The agencies were polled in advance of the meeting for two to three topic areas representing challenges or opportunities for that agency. The topics clustered around policy changes, information architectures, approaches to content, output products and services, partnerships and relationships, and resource allocations. In order to provide opportunities for all agencies to participate, a “round robin” approach was taken. The topics are presented by agency below. Themes that emerged are listed under the agenda item, Discussion of Proposed FY05 Activities.

Defense Technical Information Center – Kurt Molholm

Mr. Molholm proposed that CENDI promote the Handle protocol as a federal standard. DTIC would be willing to host a federal Handle resolver, but there may be a more appropriate agency to do so. It isn’t clear exactly what it would cost to do this. The group decided that the Persistent Identifier Task Group should follow up on this and other issues raised in the white paper that was published earlier this year. They could better define the administrative issues, cultural changes and costs involved in doing such a resolver. The Persistent Identifiers Task Group should reconvene to discuss this proposal.

Discussion

The Corporation for National Research Initiatives (CNRI) is running a Handle resolver for NAL. GPO and NASA are investigating Handles. It was noted that one of the issues surrounding the use of Handles by CENDI members and other organizations is the use of PURLs or other unique identifiers. Because the Handle is a protocol rather than an identification scheme, the two approaches are complementary. However, there has always been confusion about this and working to make this clear might encourage the use of the Handle protocol.

The Z39.18 Technical Report Standard will be going into the National Information Standards Organization (NISO) balloting process in September. Mr. Molholm suggested that Chapter 1 be required reading. The standard also includes a schema. Ms. Hodge noted that the Content Management and Access Working Group as well as the Digital Preservation Task Group will be reviewing the draft.

Department of Energy – R. L. Scott

OSTI is laying the framework for a more comprehensive approach to data centers within DOE. DOE has 10 data centers but the bulk of DOE R&D lies outside the current data center framework. DOE has 50 research facilities which generate large quantities of data, much of which is inaccessible. The goal is to capture the research so the data set can be tied to the technical report or journal article, and can be accessible via a clickable link. Recently, OSTI brought 10 data centers together with experts from NASA and NSF to discuss how to better organize the DOE data centers. The importance of good context (history) and metadata when moving to Grid computing was noted. While DOE instruments create large amounts of data, funds are not generally available to maintain and preserve this data once the project has been completed. OSTI is working to create a data management policy that will address migration, management, funding, re-use, and integration to break down the stovepipes.

OSTI believes that these challenges, while great, are not unlike those faced for text. There is support for this activity within the Office of Science; Dr. Ray Orbach, head of the Office of Science, sees simulation as the third pillar of science and this requires availability of quality data.

Some technologies that OSTI has been investigating are coming to the fore with data management. These include the Dublin Core as the basic metadata, with definitions or profiles specific to data and the scientific discipline. Handles are being investigated as a means of unique digital identification. Given the connections to standards, it would be beneficial to have a model or process for organizing data that would be usable across agencies.

OSTI also is involved in a project to digitize microfiche. Approximately 1 million pages are being scanned and OCRed. The cost is approximately 4-5 cents per page. OSTI is encountering some problems linking the metadata to the digitized records. They will continue to keep the microfiche for its preservation value.

Environmental Protection Agency – John Sykes

The Public Health Inventory Database is a new initiative that will be a portal to electronically accessible public health-related databases and resources. The purpose is to identify and provide a consolidated point of access to health information needed by EPA staff and customers. It will be in the EPA Information Management System, and, therefore, available through Science.gov. The content currently has over 333 databases, primarily data sets, and the metadata can be searched (particularly keywords and organizations). There was discussion about what was included in the “public health” scope. They are looking to be inclusive and welcome other CENDI agency collaboration.

Metadata for data sets is of interest to the Database development. The question is what would be discipline oriented metadata versus what might be held in common across the agencies. EPA’s metadata schema was driven by user needs, and the attributes are defined based on the identified user population.

EPA’s Office of Research and Development (ORD) is creating a Center for Excellence in Environmental Computational Science, based on a study by IBM. Three focus areas were identified including advanced computational models and tools, network environmental application facilities, and an environmental science portal. EPA is working with Sandia Laboratory to build a modeling center. The Environmental Science Portal would use a Grid network to run appropriate models. The prototype for this work must be in place by September 30, 2004. Terry Grady is the lead on this for EPA.

Finally, EPA is looking at an IT Governance Project. Currently, there are multiple IT-related data calls requesting much of the same information. The resulting information is scattered over many systems and even in hard copy in some cases. Consequently, EPA is looking at how to create a portal for the user to allow the information from all data calls to be integrated and available. This would give an integrated view of security, budget, project, and performance information. It will support better alignment with Office of Management and Budget (OMB) priorities, increase accountability, and better secure IT investment. Tom Tracy is the lead on this.

Action Item: Mr. Sykes will share the IBM study with the group.

Government Printing Office – T.C. Evans

GPO has a mandate to catalog and index the information published by and for the United States Government- a new National Bibliography of U.S. Government Publications. For many years, this has mainly consisted of publications distributed through the Federal Depository Library Program (FDLP). With all of the many changes in how the Government publishes information, GPO saw a need to rethink its approach to fulfilling its mandate to catalog and index that information. The result of that effort is a draft plan from GPO on establishing and maintaining a National Bibliography of U.S. Government Publications and that plan is currently out for comment. Under Title 44, the Monthly Catalog was legislated originally as a national bibliography. The current effort is likely to result in more of a true national bibliography given new technologies and relationships, the increase in digital publication, resulting in a more electronic FDLP, and the fact that fewer print items go to the Depository Libraries.

In order to accomplish this in a web environment, GPO is interested leveraging the work of other agencies by utilizing federated search technology to allow for the search of metadata across Government, federating such a bibliography across the agencies by harvesting metadata. The GPO is already involved in the discovery and harvesting of information that is in-scope for the Federal Depository Library Program (FDLP). Mr. Evans suggested collaboration among CENDI members on a harvesting tool and ways to share metadata. It was also suggested that the agencies help define what the national bibliography is and what it should contain. This should also be coordinated with FirstGov's efforts to create a Public Directory, so that the National Bibliography is visible and available through FirstGov. GPO had a meeting with FirstGov about how the projects would dovetail.

NASA STI Program – George Roncaglia

NASA is focused on the response to the Columbia Accident Investigation Report and the return to Space Exploration. The latter is changing the strategic focus and alignment of resources. More realignment of resources is expected since the cost to get the shuttle back into operation is over budget.

The internal STI systems have been redesigned to be compliant with the Open Archive Initiative (OAI) and more Yahoo-Google-like. NASA STI is also prototyping persistent identifier systems including one with Handles. Standards are being endorsed for document management systems; there are 37 unique systems in NASA.

National Agricultural Library – Peter Young

A presentation was given recently to the Agricultural Research Service management. It addressed where NAL has been and where they are going as a customer-driven organization. The vision amalgamates the current services from NAL plus the Digital Desktop with document delivery service to serve both the scientist and the public. Integrated services in the future could include virtual chat with NAL and AgNIC partners and access to data sets (proteomic, genomic, or GIS), images, and electronic lab books.

NAL has used Outsell consultants to learn more about their clients. They found that 56 percent prefer self service (to do their own searching), 20 percent prefer automatic updates or alerts, and only 14 percent wanted to work with a librarian. In addition, Outsell conducted a competitive analysis between NAL and other agricultural libraries such as Cornell University’s in terms of the dollars per scientific year. At the universities included in the study, individual scientists spend an additional $1,000 per year on information resources and their departments spend about $3,000 more in addition to what is spent by the library.

The NAL continues to follow the recommendations of the 2001 Blue Ribbon Panel despite limited resources with which to address them. They recently presented an alternative scenario for the next three to five years to reduce the risk and achieve the objectives.

Nutrition.gov will have a soft launch in October 2004. The Human Nutrition Information Center will change to focus on the general public.

In terms of portal development, NAL must work within the USDA’s standard technical solution. However, NAL’s Agriculture Thesaurus is being used as the taxonomy for the USDA-wide portal.

Over the next few months, NAL will continue its strategic assessment and planning process using Outsell’s services. (Several CENDI members have annual subscriptions to Outsell reports. There are also options to contract for special projects, strategic planning, needs assessment, and analysis of stakeholder customer relations.)

National Archives and Records Administration – Lewis Bellardo

The Electronic Records Archive contracts were awarded to Harris and Lockheed Martin. They will both start the system architecture phase on August 16, 2004. This phase will last for nine months and include the systems analysis, design, and prototype development. Basic ERA capability is required by 2007 with full capability by 2011. It was suggested that the ERA staff speak at a future meeting. When designs are produced, they could be reviewed by CENDI.

The State Department and NARA signed an agreement in April under which the State Department will send old cables from the late 1960s to 1972 to NARA. These will be ingested into ERA early on in its development. More recent cables will be used as test materials for a prototype to ensure that they can be exported to an archival environment. The early electronic records, such as the cables, provide challenges since there was no preservation planning at the beginning of the life cycle of these records.

The ERA is intended to be upwardly scaleable up and useful for smaller state and local collections, as well as to other government agencies. The contract requires the use of COTS software as much as possible with stitch-ware to bring it together. The stitch-ware is usable for any federal purpose. The ERA will cover both preservation of records and access to records by researchers and by agencies.

The funds to begin to build the ERA are in both the House and Senate budgets and in the President’s Budget. The OMB submission in the next few weeks is critical, since it will determine how fast the ERA can be built.

NARA continues to work with others on issues related to digital archiving and preservation. NARA and the National Digital Information Infrastructure and Preservation Program (NDIIPP) are working collaboratively. Both Dr. Bellardo and John Carlin, the Archivist of the United States, are on the NDIIPP’s Advisory Board. NDIIPP is focusing on non-government material while NARA is focusing on government records. NDIIPP is taking advantage of the results of many years of preservation architecture work at the San Diego Supercomputer Center, which was funded by NARA and others. However, at some time there will likely come a point where there is divergence between the work of NARA and NDIIPP because of the types of materials being preserved. NARA is also involved with the University of Maryland in a conference on November 15-16, 2004, on long term preservation. Nancy Allard will send information to the group.

NARA is working on policies, guidance and regulations. Under its Custody Policy, NARA establishes agreements with agencies or others to support the archiving of government records by identifying a body of records for which the agency can provide preservation and customer service. For example, NARA has an agreement with GPO to preserve GPO Access. NARA has investigated R&D records as part of its Appraisal Guidance. The National Science Team is gathering information on the R&D work processes at the agencies to identify similarities and differences and to refine and amplify the guidelines based on the information gathered. EPA, NASA and DOE will be visited this month.

NARA will be issuing regulations on transitory e-mail, if approved by OMB. The current requirement to put e-mail into a records management system won’t be required for those records that are defined as truly transitory.

Action Item: Secretariat should include a presentation by ERA staff on an upcoming meeting agenda.

Action Item: Nancy Allard will provide information about the upcoming meeting at UMD on long term preservation.

National Library of Education – Christina Dunn

A contract was recently awarded to centralize the creation of ERIC. The areas of immediate interest include metadata, taxonomies, and automatic indexing. Many of the items discussed by others, such as persistent identifiers, are also of interest in this effort.

National Library of Medicine – Dr. Fred Wood

Web metrics continues to be a focus. Most recently an article about NLM’s multi-phased approach was published in IEEE’s IT Professional. NLM received money from NIH to test five web sites using the American Customer Satisfaction Index (ACSI) product. This is a combination of standard and customized questions and then analysis and benchmarking of the results. OMB has pre-approved the ACSI approach. Based on the outcome of that activity, NLM proposed a trans-NIH approach to customer satisfaction surveys. This includes 28 NIH units and 68 web sites. It appears that NIH will fund this as an NIH activity. NLM is planning to negotiate the price with Foresee, based on the volume of web sites.

ACSI’s federal presence is significant with 50 to 60 federal web sites in the ACSI benchmarking and others in the pipeline. There may be a group mechanism that would reduce the cost across agencies. Dr. Wood encouraged members to look at the questions that NLM used.

NLM uses Keynote to determine how many people are looking at its sites. The speed for 36 locations around the globe is monitored every 10 seconds. Web Trends facilitates analysis of the information by country. NLM is experimenting with an NLM-centric network that would assess speed within the network, including the regional medical library network.

National Science Foundation – Patricia Bryant

NSF has been involved with Grants.gov from the administrative point of view. There are some forms that aren’t ready to support the “back office” work, even though there are mandates to implement Grants.gov very shortly and to eliminate the individual agency grant sites. It isn’t clear what will happen to Grants.gov after November.

“New Architectures, Tools and Technologies for Life Cycle of Scientific Communication”

The Content and Production Integration Challenge, Kurt Molholm

DTIC’s goal is to turn data and information into useful knowledge for its two distinct audiences -- researchers and managers/planners. Managers and planners use the Research Summaries and the Independent Research & Development (IR&D) databases, while researchers use technical reports. DTIC’s current architecture has different structures by content type, format and media. Materials are housed in three primary systems which require separate log ons.

As an industry, we are now actually developing systems to address content with IT rather than finding content to supply to IT-driven systems. DTIC’s project goal is to develop an enterprise content-centric system that integrates other DTIC initiatives, provides full functionality with scalability and performance to support identification and discovery, and that uses DTIC’s structured, semi-structured, and unstructured data.

DTIC’s approach is to deploy a third generation retrieval system. This requires modernization and reformatting of data. TIFF images, used in the current system for documents, will be converted to PDF and/or XML. Microfiche will be converted to images and then to PDF or XML. The input systems will be integrated to feed a central repository, and a system will be developed to allow contributors to supply data to the system.

DTIC investigated two alternative architectures to address these requirements – the All-in-One Repository and Federated Searching. The first approach extracts, transforms varying formats, and loads it to a single repository where a common search is executed. The second approach keeps the original materials in native formats and uses different but parallel searches to retrieve the information.

DTIC has selected the All-In-One Repository approach using MarkLogic Content Interaction Server that has been designed for content management. DTIC believes this approach will provide a more robust, more scalable, and more functional content management system, while dramatically reducing the costs for producing and maintaining the content. However, the architecture will still include federated searching, since that approach is used by the Defense Information Virtual Architecture (DIVA). The DIVA allows access to Open Archives Initiative repositories and content at the granular object level. Handles will allow CrossRef-type services, Amazon-type suggestions, and improved support for Advanced Distributed Learning.

A number of factors were involved in the decision to use the All-In-One approach. They include reduced production costs, since many of the creation and deployment functions are automated, DTIC can focus on the production of abstracts which are automatically synthesized from reports. The Report Catalog can be expanded to include external sources deemed appropriate. Users can easily export summary reports of their searches to Excel and save and share their searches. Using “contextual search” users can focus their searches on different areas of the research reports. Ultimately, this approach provides a single system for document processing, secure storage, search and presentation. Metadata and data cannot fall “out of synch”.

MarkLogic is used by Elsevier. (Elsevier began by building a unified Web services architecture built on IBM WebSphere.) SGML content is converted to XML on the fly, but Elsevier is moving to an XML-native content store and the use of XQuery from MarkLogic. This will provide the advantages of granular content search and reuse.

For materials that will be held outside DTIC, the federated searching approach in the architecture supports the advanced distributed learning initiative. The objects will be held in distributed SCORM-compliant repositories and a Handle resolver will bring objects together in support of learning initiatives. There are an estimated 100-200 such repositories and 10 million digital learning objects. DTIC can use this Content Object Repository Discovery and Registration Architecture (CORDRA) Concept without having to manage the objects directly.

Mr. Molholm recommended that CENDI conduct an architecture review workshop, as it did several years ago. The workshop should be for CENDI staff only. Such a workshop will probably require a multi-day meeting. Ric Thoroughgood is willing to lead this effort.

Action Item : Ric Thoroughood to lead a CENDI architecture workshop with support from the Secretariat.

“Science.gov” Eleanor Frierson and Tom Lahr

Science.gov had significant technical accomplishments in FY04. Version 2.0 was launched on May 11, 2004, providing a single search box, default searching of all databases and web sites, and the QuickRank feature. In addition, the cap on the number of deep web databases that could be searched simultaneously was raised significantly. There are now 30 deep web databases. Two from the US Forest Service were added in FY04. A new Regulations section was added with the Federal Register and the Code of Federal Regulations provided by GPO. There are now more than 1710 web sites with 156 added since August of 2003. Routine updates, link checks, and quality control are performed.

OSTI’s web analysis tool is used to monitor usage on a monthly basis. There were significant spikes after the launch in May and June. Usage spiked to more than 100,000 page requests and 250,000 searches per month. Usage dropped from these high levels after June but not to the previous levels. The top referrers are Google Search, Yahoo! Search, NIST, USDA Research & Science, GPO, and Google India.

There were significant promotional activities surrounding the launch. Fifteen articles followed the launch including Information Today, the DOE and EPA newsletters, and a two-part article in Government Computer News. Events included the Science.gov Way Celebration, the 2.0 Launch, and exhibits at the National Science Teachers Association, the American Association for the Advancement of Science (AAAS), Special Libraries Association (SLA) and American Libraries Association (ALA) meetings. These exhibits were often part of the exhibits by Alliance members. Science.gov was included in presentations to the Information Management Conference and the National State Transportation Directors’ Meeting.

NSF was added to the Alliance through CENDI membership. Karen Klima is the new vice USGS representative replacing Ken Lanfear. She will serve as liaison between Science.gov and the Department of Interior Web Council with the goal of including information from more DOI bureaus. Interest has also been expressed by the Federal Aviation Administration of the Department of Transportation (FAA/DOT) and the Veterans Administration (VA). Web pages for members and prospective members were added to the Science.gov web site.

Other organizations have expressed an interest in establishing partnerships with Science.gov. Science.gov is included in a proposal to NSF for development of a minority science education portal. Recently, the Office of Scientific and Technical Policy (OSTP) approached Science.gov to host the Excellence in Science, Technology, and Math Education Week web site on a permanent basis. There have also been requests for linkage from international and ‘.org’ science sites.

The Alliance structure is maturing and those involved have worked out many of the responsibilities this past year. A proposal was put before the group at its recent meeting to reduce the number of task groups within the Alliance structure. This is still being discussed.

Mr. Lahr and Ms. Frierson were interviewed about the Alliance governance structure by the Digital Government project at the Harvard School of Government, by staff of the WorldBank, and by a German Science portal. The German project was also interested in linking between the two portals and other collaborative activities.

Eight agencies are contributing to the development of Science.gov 3.0. Alerts and fielded searching will be available first and will be rolled out when they are ready. MetaRank and enhanced Boolean searching will be added later. Science.gov 4.0 has already been funded by DOE’s SBIR Program. It will develop DeepRank using Grid technology.

In 2004-2005, the minimal budget must cover maintenance, which includes the addition of content, streamlining the governance structure, and outreach to potential new members. In addition, an effort is underway to change the browse categories to reflect new content and new members. This will include changes in the areas of engineering, homeland security, diseases and medical conditions, agriculture and water quality.

Science.gov 4.0 or 5.0 may address the issue of cross-cutting taxonomies and improved semantic searching. An initial approach would be to post links to available thesauri on Science.gov as a resource. This will be discussed with representatives at the upcoming CENDI workshop on taxonomies and thesauri in a web environment.

Action Item: Ms. Hodge will discuss with the lexicographer staff attending the September 16, 2000, CENDI-sponsored workshop the possibility of establishing a registry of CENDI knowledge organizations systems, including thesauri, authority files, classification schemes, etc., that can be linked to from Science.gov.

Dr. Warnick proposed that Eleanor Frierson be CENDI Deputy Chair. Because of the work she has been doing with the Alliance, she is his choice to work with this year. (Note: It is the Chair’s prerogative to determine if s/he wants to appoint a deputy. CENDI Handbook)

“E-government and Impact on Agency STI Management” Bonnie Carroll

Under Section 207 of the E-government Act of 2002, there are recommendations for agency action that are being further developed by the implementation groups. The Policy Working Group attempted to identify the status and the potential impact of the various efforts within the Interagency Committee on Government Information (ICGI) which is dealing with categorization, directories, taxonomies, R&D system, web site standards and archiving and preservation.

The Web Content Group is the furthest along and has produced a “cookbook” document. The guidelines may have little impact on more sophisticated web sites where many of these principles are already applied. Requirements, including plain language and versions for those who are non proficient in English, are already required by other legislation, regulations, or directives. The guidelines call for an assessment in this regard, but not the solution. It was noted that the definition of plain language may be an issue. The Group is now working on a tool kit to support the implementation of the guidelines.

All four groups in the Categorization Working Group produced their initial requirements and draft recommendations. However, the decision was made to combine these reports. The ICGI Executive Committee is focusing on the categorization working groups and the process by which the four separate reports will be combined. There are several recommendation areas including approaches to categorization and recommended metadata elements and definitions. The joint report is an effort to harmonize across the working group areas. A public meeting is scheduled for August 30, 2004.

The E-records group issued a document outlining the barriers to e-records management. Common characteristics of records are being defined, including common metadata elements. This has gone forward to the ICGI.

Ms. Carroll reported that she contacted Dan Costello at OMB about the R&D summaries provision in the E-Government Act, but she did not get a response. Mr. Molholm reported that he asked about this when the Act was first passed. He was told that there would be no implementation group since each department would be doing its own system. No particular standards were provided. However, Radius elements are mentioned in the final legislation.

With the exception of the guidelines from the Web Content Group, there is insufficient information at this time to determine the impact the recommendations from the other groups might have on the CENDI agencies. However, the Policy Working Group will continue to monitor the situation.

Action Item: The Policy Group will continue to monitor the recommendations coming from the various E-government implementation groups, with support from CENDI staff that participate on E-government committees. They will keep the membership informed via the listserv.

“Value of Cooperation: Scope and Influence of CENDI,” Bonnie Carroll

Ms. Carroll highlighted the accomplishments for CENDI in FY04. Membership was expanded to 12 agencies with the inclusion of NSF. CENDI now represents the STI from 96 percent of the federal R&D budget. CENDI has established a variety of partnerships. It had contact with approximately 55 organizations in FY04. One new member was added to CENDI and Science.gov.

In terms of coordination and leadership, CENDI has contributed extensively to the development of the E-government Act implementation. There has been joint input from the group and many CENDI agency staff are participating in the E-government implementation groups at a variety of levels. CENDI took the lead, under Jim Erwin, for the searchable identifiers requirement.

In addition to E-government, CENDI provided input to the Federal Geographic Data Committee (FGDC) Sensitive Geospatial Data Guidelines.

There were interactions with leaders in government including Karen Evans, the Chief Information Officer (CIO) for OMB; Lee Holcomb, the Chief Technology Officer for the Department of Homeland Security; and Spencer Abraham, the Secretary of the Department of Energy, through the Science.gov 2.0 Launch. CENDI cooperated with the International Council for Scientific and Technical Information (ICSTI), the Committee on Data (CODATA), and the National Federation of Abstracting and Indexing Services (NFAIS).

Ms. Hodge was on the program committee for the NFAIS meeting last year and is participating again this year. CENDI is cooperating with ICSTI and CODATA on the AAAS Symposium on Open Access which has been accepted. In addition, there were discussions with industry leaders such as Google and Yahoo! Search. The Copyright Frequently Asked Questions (FAQ) document remains a “best seller” and provides significant visibility for CENDI outside the STI community.

CENDI continues to address the STI Life Cycle management. Science.gov promotes the distribution of STI. CENDI has been involved in international cooperation on persistent identifiers with both the International DOI (Digital Object Identifier) Foundation and The Stationery Office of the UK. CENDI’s Working Group provided input to the NARA appraisal guidelines for scientific records.

In support of the education goal, there were several publications including the Digital Preservation report published jointly with ICSTI and white papers on persistent identification and copyright. Workshops on metadata for digital rights management and on thesauri and taxonomies in a web environment provided educational opportunities for CENDI staff.

External recognitions, such as the Federal 100 Award received by the CENDI Executive Director and the membership of Dr. Warnick on the Federal Depository Library Council, have brought visibility to CENDI.

The CENDI web site was enhanced during FY04. Key areas that are currently under redesign or development include the STI Manager and a Members Only Page. The latter will be used to post full meeting minutes, contact information, and other member’s only documents. The listservs were reworked so there is a list for principals and alternates only (CENDI-PA), and a list that includes the working group and task group chairs and key agency staff in addition to the principals and alternates (CENDI-L). The brochure insert was updated twice to include new members.

There are many external and internal challenges for FY05 that should be considered in our planning. External challenges include the changing models of scientific communication driven by changing technologies; new industry players and relationships; and a changing user base with changing expectations. There will also be a new Administration and the possibility of new strategic responsibilities for the agencies. Internal challenges include the integration of new members, and the resource allocations and interaction between the Science.gov Alliance and CENDI membership. CENDI will need to consider next generation succession planning, given the changes in representation due to retirements and other relocations. This is especially important in CENDI’s relationships with other organizations such as NFAIS and ICSTI, where CENDI has benefited by having overlap in member involvement.

“CENDI Vision, Mission, Goals and Focus Areas”, Bonnie Carroll

The vision and mission were reviewed. The mission derives from the original MOU. The three goals – Coordination and Leadership, STI Life Cycle Management and Education -- have been fairly constant over the years. Science.gov is considered to be part of Goal 2. Goal 3 includes the issue of the value of STI. The Group agreed that the vision and mission are okay. Dr. Bellardo suggested that the term “STI Life Continuum” could be used instead of Life Cycle. He will send information from Canada where this term is used to acknowledge inclusion of long term preservation in management planning.

Action Item: Dr. Bellardo will provide information about the use of the phrase “STI Life Continuum” instead of life cycle so that CENDI can decide if it is appropriate to change the way the STI Life Cycle Management goal is stated.

Last year there were four main focus areas: Public Domain Directory (Science.gov), Digital archiving and permanent public access, Federal policy and legislation, and a review of technologies of interest to CENDI members. The group agreed that these main focus areas would continue for FY05, but that the term “Public Domain Directory” would be replaced with “Science.gov.”

Action Item: The Secretariat will replace the phrase “Public Domain Directory” with Science.gov in the list of focus areas for FY05.

Operations and Communication

A promotion/communications plan for CENDI was presented. It includes several levels of activity with more attention to branding and more active use of press releases. Key issues surround the target audiences that CENDI wants to reach and metrics to determine success for that audience.

Action Item: The Secretariat will proceed with further development of the promotion plan for CENDI, addressing the key issues of target audiences and metrics to determine success.

The group approved the development of a new CENDI brochure for next year. Members were asked to provide ideas.

Action Item: A new brochure will be developed next year. Members are asked to provide ideas for the content and layout to the Secretariat.

Technical Operations Working Group Recommendations for FY05 Activities

All working groups met via teleconference prior to the planning meeting to review their accomplishments and make recommendations for activities for FY05. The members were asked to review the working group and task group rosters in the planning book to make sure that the appropriate staff are involved and to assign new staff. This is particularly important for new member agencies.

IT Security and Privacy – Simon Liu

The group held a CENDI staff workshop on Intrusion Prevention at NLM on May 11, 2004. This was a follow-on activity to a previous workshop on Intrusion Detection. It was attended by approximately 15 agency staff. The vendor presentations are available from the CENDI web site.

For the next year, the group will focus on networking and information sharing. They have found that few IT people have time for workshops or educational opportunities that are not directly related to questions at hand. Listservs will be developed and the group will hold periodic teleconferences around hot topics to promote discussion. However, at the request of the CM&A Working Group, IT Security will support an activity on the topic of XML security.

Discussion

Several agencies expressed an interest in the topic of cross-agency digital signatures. The Health and Human Services agency (HHS) has been working on this and there is interest in how systems connect to the Federal Bridge. This may be a topic for the group to pursue in a conference call.

Content Management & Access – Charles Bradsher

This group conducts activities both under the working group organization itself and through specific task groups. The Working Group has activities related to metadata, thesauri, and terminologies on the Web, XML technologies, and content management. Task Groups for Persistent Identification, Digital Preservation and Distribution Markings have been organized under this working group.

Metadata and XML Schema

The group is collecting metadata element sets and XML schema or DTDs from the agencies. These sets will be compared to determine a core element set, particularly for bibliographic content. This activity will continue in FY05. Also, the working group proposed an educational event on Open Archives Initiative (OAI) harvesting.

The group will also comment on the DTD (Document Type Definition) for technical reports which is part of the draft Z39.18 standard. The group proposed the joint project with IT Security on XML security issues. The latter is an important bridge between the IT/security people and those involved with content.

Terminologies and Controlled Vocabularies

In the area of terminologies and taxonomies, the group has organized a CENDI-only educational event which is scheduled for September 16, 2004, at EPA. In addition, the group monitored development of a government-wide taxonomy under the ICGI. Further activities are likely during FY05 in both these areas.

Content Management Systems

A new activity in the area of content management was proposed. The CM&A would investigate methods and tools for PDF enhancement including link creation from citations, metadata extraction, etc. The group would also like to investigate content management systems by holding a CENDI-only “industry day” with the vendors. This might be scheduled after the proposed Architecture Review workshop has identified some common requirements that the vendors could address.

Persistent Identifiers

The Persistent Identifiers Task Group published a white paper on Persistent Identifiers and Federal Information, which was distributed to the E-government interagency committees and others. CENDI also has the lead on the “Searchable Identifiers” subgroup. Jim Erwin submitted draft requirements and has been working with the other groups under the Categorization committee to merge the separate documents. If the proposal to promote the use of Handles as a standard federal protocol is accepted by the group, the Task Group may need to reconvene, perhaps with some additional or different members.

Digital Preservation

The Digital Preservation Task Group published the ICSTI/CENDI Report in FY03. It is available from both the CENDI and ICSTI web sites. The group also commented on the PDF-A draft standard and monitored the work of NDIIPP, NSF and PREMIS (preservation metadata standards). A meeting is scheduled for September 15, 2004, at NLM, where NLM staff will discuss the incorporation of the permanence ratings scheme with its content management software, the DTDs for e-journals and archiving, and the backfile digitization project.

The Digital Preservation Task Group plans to continue monitoring standards, best practices, and the work of other preservation groups. The group will specifically provide comments on the PREMIS draft standard for preservation metadata, which is expected later this year.

Discussion

The Association for Information and Image Management (AIIM) and the National Information Standards Organization (NISO) have a joint group identifying requirements for digital preservation systems. Ken Thibodeau from NARA will be on the group. The work will not be totally focused on records systems. The group should follow this as well.

Distribution Markings

The Distribution Markings task group has completed the final draft of the revised markings document which includes markings from NASA, DOE and DoD. The final will be completed in FY05 and a training event or materials will be considered. The group recently discussed how to maintain the Distribution Markings document since there are changes all the time. The group plans to summarize marking practices for electronic materials and to determine the issues related to marking in digital formats.

Copyright and Intellectual Property – Bonnie Klein

A workshop on Metadata for Digital Rights Management was held on March 22, 2004, with over 22 attendees. Relevant legislation and regulations were monitored and the FAQ was updated twice. The group is also writing a series of white papers for OMB on rights management. This includes a paper on copyright, terms and conditions notices for the E-government group on Web Standards and Guidelines.

The group expects to continue work on the whitepapers, to update the FAQ, and monitor relevant legislation both nationally and internationally. Proposed new activities include work with the FLICC Content Management Group on rights expression language as it relates to government information and development of a technical session at a CENDI meeting where the attorneys, open access facilitators and others could talk about rights initiatives. (The principals suggested that the group investigate copyright issues related to open access and institutional repositories.)

Ms. Klein also reminded CENDI members that there are opportunities for both operational staff and counsel to participate in the group. She also requested links from appropriate agency web pages to the FAQ where they do not currently exist. Some agencies have linked the FAQ from their general information or notice pages while others have made it available for agency staff. The FAQ gets about 200-400 hits per week. Three of the E-government groups (similar to the ICGI) in Canada have linked to the FAQ.

Policy Working Group Recommendations for FY05 Activities

During FY04, the Policy Working Group provided input to the E-government Act implementation and to the FGDC Guidelines on restricting Geospatial data. Along with CODATA and ICSTI, they developed and submitted a successful proposal for a AAAS 2005 Symposium on “Changing Scientific Publishing: Open Access and Implications for Working Scientists.” (Ms. Bortnick-Griffith noted that a similar session proposed by the Public Library of Science (PLOS) and the Scholarly Publishing and Academic Resources Coalition (SPARC) was also accepted, so the group will see if these sessions should/could be combined.) CENDI members were kept apprised of a wide range of policy issues from intellectual property to national security, and database protection. The Working Group also maintained contact with the OMB-CIO, the Library and CODATA communities.

In FY05, the group will continue to monitor the E-government implementation and to analyze the potential impact on agencies. The AAAS Symposium will occur in February 2005. Monitoring and maintaining contacts with key communities will continue.

Activities proposed for FY05 include briefing the new Administration and monitoring OMB and CIO developments given the potential for new leadership in many of these areas whether there is a 2 nd Bush administration or a new President. This will include monitoring who is on the transition committees and the likely role of OSTP. 

A key policy area for FY05 will be open access scientific publishing. In FY04, the language in the House Appropriations committee report required NIH to study the cost of biomedical literature and the impact on access and what NIH's response would be. NIH produced a report addressing the cost and access issues, but it did not discuss NIH's response because NIH is still in the process of developing the policy. 

Draft FY2005 appropriations report language recommends to NIH that all published manuscripts be deposited in PubMed Central after six months. However, those grantees that used grant money to pay for publication page charges would be requested to deposit the manuscript immediately.

There is a large letter writing campaign on both sides to Congress and NIH. Ms. Griffith distributed an article that appeared in Science magazine as representative of the high visibility of the issue.

NIH is holding a series of meetings on open access with Dr. Elias Zerhouni, Director of the NIH. The session with publishers was attended by about 45 publishers. Dr. Zerhouni reiterated that the government has no intention to tell publishers what to do or what to charge, but that NIH has to focus on what is in the public interest. He stated that the status quo isn't acceptable. The infrastructure has already been developed by PubMed Central, so for NLM to receive open access journal literature in PubMed Central requires relatively little additional cost. Two more stakeholder meetings with the scientific community and public interest groups are being scheduled. NIH will address open access in its grants manual. The text will be provided in draft form with time for public comment.

Other examples of NIH policies that support information sharing include its data sharing policy where grantees must submit a data sharing plan with their applications for funding and the requirement for the large gene sequencing centers to deposit data in GenBank. NLM also currently cites open access literature in Medline and provides links to the associated free full text of the articles.

It was noted that while OMB Circular A-110 allows federal purpose rights for grant information, agency lawyers don't really want to define these rights too closely. The NIH recommendation for deposit of manuscripts resulting from NIH funded research would be part of the grant policy.

These discussions at NIH have raised awareness about the issue of open access, including at other science agencies. However, there are many issues including time frame, the definition of manuscript, and how manuscript submission would be handled that have yet to be finalized.

Discussion

The group discussed the impact of the NIH outcome on other CENDI agencies. There is a great deal of grant funding across the CENDI agencies. There may be pressure on other agencies to create institutional repositories, but sustainable funding is a question. Ms. Griffith noted that NCBI is developing PubMed Central software capabilities that can be used by other repositories internationally. This tool suite may be of interest to others agencies that need to create repositories.

It is interesting that the United Kingdom's House of Parliament Science Committee recently published a report that also proposed public access to the results of government funded research and identified the British Library as a repository.

Ms. Klein proposed an open give and take with the legal counsels on open access publishing to discuss legal, pragmatic and political issues.

Discussion of FY05 Proposed Actions

In addition to the proposed activities presented by the Working Groups and Task Groups, Ms. Carroll summarized the possible FY05 actions based on the discussions during the roundtable.

FY05 Proposed Actions:

 

Speakers for Future Meetings:

 

 

Initiate More Strategic Planning for CENDI:

 

 

To summarize the above in textual form, the Group suggested a process by which CENDI could spend more time looking at the bigger picture. Futurists could be invited to CENDI meetings to discuss what might occur in the future in education, science, information management and dissemination. Possibilities include the Institute for the Future or the World Futures Society. NLM is including this futurist approach with the strategic planning process; Dr. Siegel will share the strategic planning process as it progresses. Some information from the NDIIPP consultation process might also be useful. The group also suggested an environmental scan for each meeting. A subscription to the OCLC Environmental Scan might be used. Reports on ICSTI, NFAIS and other organizations could be an item on each agenda.

On International programs, it was suggested that a discussion be scheduled with the Library of Congress about its information exchange program (headed by Don Panzarra) and the treaties that are in place to support international information sharing.

Another area of interest is combining multimedia formats. Electronic journals may not have achieved the mixture of media one might have hoped. Science.gov might investigate this and help to move the integration of media types forward.

Action Item: The Secretariat will combine all the possible actions into a form for prioritization and provide it to the members by mid-September.

Action Item: Schedule a discussion with the Library of Congress about an information exchange program on international information sharing.

Operational Issues

The Group approved the CENDI promotion plan in principal. Part of this includes changing the CENDI domain name from www.dtic.mil/cendi to www.cendi.gov. This domain name is available but it must be purchased by a government agency. NTIS will purchase it for CENDI.

Action Item: NTIS will purchase the www.cendi.gov domain name.

The group approved the redesign of the STI Manager. This is a catalog of STI documents, web sites and related organizations which is maintained by the Secretariat. It currently has over 500 items cataloged but there is no search engine.

Action Item: The Secretariat will proceed to make the STI Manager searchable in addition to browsable.

The minutes for June 2004 were approved as presented.

Contract and Financial Discussions

The principals and alternates met in executive session and discussed the contract and funding for FY05.

Handouts

NIH Roadmap Initiatives (distributed by J. Griffith)

International Agreement to Expand PubMed Central, NIH News, June 25, 2004. (distributed by J. Griffith)

“Seeking Advice on ‘Open Access,’ NIH Gets an Earful,” Science, August 6, 2004.

“Unofficial House Appropriations Report Language,” FY2005 HHS Appropriations [regarding PubMed Central and open access].