CENDI PRINCIPALS AND ALTERNATES MEETING

National Aeronautics and Space Administration
Washington, D.C.
February 2, 1999

Minutes

Update on OMB/OIRA: Priorities and Plans
Scientific Data in the 21st Century - NIST Standard Reference Data and CODATA
NIST Information Services in the Information Era
NASA Systems and Technologies Update

WELCOME

Tom Pedtke, CENDI Chair, began the meeting at 9:15 a.m. He thanked NASA for hosting the meeting. Introductions were made.

SCIENTIFIC AND TECHNICAL INFORMATION EXCHANGE

Update on OMB/OIRA: Priorities and Plans
Peter Weiss, OIRA

Mr. Weiss indicated that there are a number of priorities within OMB/OIRA at this time. Several of them may be of interest to CENDI members.

In the Omnibus Act of 1998 there was a Government Paperwork Elimination Act which received little fanfare at the time. It calls for agencies to provide electronic filing alternatives for the citizenry within five years of October 1998. This includes government processes such as student loans, applications for benefits, etc. Originally, the language called for alternatives within 18 months, but the Administration successfully argued that the dates were too aggressive. Instead, guidance will be provided by OMB within 18 months and there is a five-year period for implementation. The guidance is aimed at agency managers and general counsels to dispel the myth that electronic signature alternatives are not allowed. Anyone who is technically savvy will already be conforming to this guidance. It simply reinforces the previous Paperwork Reduction Act, and stresses the risk management principles of the Computer Security Act as well as the need for management controls. There are good examples of projects underway that use electronic signatures including the Interagency Electronic Grants, Access America, and public key infrastructure development. The draft will be in the Federal Register within a month. There will be a long 120-day comment period. OMB's intent is to take the full 18 months for developing the guidance so that lessons learned from actual projects can be incorporated.

Discussion

Mr. Weiss was asked if an electronic process is required. He indicated that the electronic process must be available as an option, but that paper alternatives are acceptable and needed in certain cases. The aim is ease of use on the part of those citizens who can deal electronically with the agencies, and to reduce paper by moving to a more electronic system where the paper will be destroyed and only the electronic records retained. NARA has not caught up with this move to electronic processing, but there is pressure to proceed anyway.

Also passed by last year's Congress as part of the Treasury Postal Act is a law regarding the ability to FOIA data in possession of a grantee. This is based on a situation where a member of the public was denied access to information held by an EPA grantee that was used to inform an EPA rule making process. Industry objected to lack of data disclosure and EPA did not insist on disclosure. This Act has had a great deal of coverage in the science press and many of the editorials have been emotionally written. Mr. Weiss is sympathetic to both sides. He indicated that there needs to be a presumption that the underlying data must be made available for validation and replication. However, grantees, especially those in the private sector, need to have some time frame within which they have some proprietary rights. He does not accept proprietary interest in the data on the part of federal researchers.

There will be an OMB A-110 change in the Federal Register. The attempt will be made in the OMB wording to recast the legislative language to help bring an effective middle ground to be more balanced in the legitimate concerns of both sides. A distinction will be made between the intellectual property rights for the publication of research results and the underlying data. Agencies are encouraged to comment. Mr. Weiss cautioned that if the grant making agencies are perceived as too shrill in their opposition, it will not help their cause. Mr. Weiss will advise the Secretariat when the change appears in the Federal Register.

Database Protection legislation has been reintroduced as H.R. 354. This language is based on misappropriation again, limiting protection to cases where piracy has resulted in actual economic harm. Mr. Weiss doesn't see a bill finalized before the end of the fiscal year and maybe not until next year. There is too much divergence and new, credible alternatives to discuss. Mr. Uhlir believes that a minimalist approach is likely, which would not address the European reciprocity clause.

On the international front, the impetus behind the actual use of Database Protection is government commercialization. There is a serious move on the part of European government agencies in the area of meteorology to assert database protection of government information. At a recent meeting attended by Mr. Weiss in Geneva, there was a demand that the U.S. Weather Service "dumb-down" their web page and explicitly identify the origin of the data sets used. This is extremely hard to do in integrated systems such as this. They also demanded that if academia or business want the data, they must go back to the originating country.

Mr. Weiss then asked Paul Uhlir to discuss his view of international database protection following a two day forum from which he just returned. Mr. Uhlir indicated that there may be movement within the European Union to seek a more "U.S.-like" position. DGXIII has released a Green Paper with wording similar to A-130 in regard to public access to government information, particularly in the sciences. A copy of this Green Paper will be provided to the Secretariat for redistribution.

Discussion

Several CENDI members asked about the involvement of the scientific community. The Academies have been following these issues at the national and international level, using internal lobbying money and support to legislative drafting of the Senate version of Database Protection legislation in the last Congress. The Senate version would have been better than the House version in the 105th Congress, since it addressed compilations of government produced data by the private sector.

Mr. Uhlir and Mr. Weiss were also asked to address the dynamics in WIPO. The International Council for Science (ICSU) did a white paper which had a strong influence on the position taken by developing countries. Developing countries rely heavily on free information from the more developed countries. However, there has been little involvement on the part of the European scientific community. It may be that the science ministers have not been briefed on what is being proposed by other government officials regarding the flow of scientific information such as weather data.

Mr. Uhlir than reported on the NSF workshop he convened on January 14-15. It was designed to get a better understanding of the needs for scientific and technical data in a real world context. Providers from the public and private sector in a variety of disciplines were invited to establish ground work that would allow a look at science information policy in the real world. The working groups at the meeting looked at the needs and issues of various disciplines as well as various policy regimes for databases: -- restrictive, misappropriation or looser models. There will be a final report which will address these topics as well as other issues raised such as government data and the use of data for educational purposes. The intent is to help advise Congress and the science agencies who funded the study. The results will be published in late May.

Since the current legislative process will not likely happen before the end of the year, the May report may provide input. Regarding a bill, the House and Senate have different views. It is hoped that some creditable options will be introduced. The Commerce Committee may take action. At the bottom line; it is true that the discourse has been broadened, more Congressional Committees are involved. This added complexity might drive to a minimal approach. Also, the European reciprocity issue isn't playing out as originally proposed. They are backing off dates and there are significant questions of the definition of equal protection.

The final area of OMB activity is Printing Reform. Mr. Weiss indicated that it is not clear what will happen this year. Senator Warner is no longer the head of the Committee and Senator Ford has retired. Senator Thomas, the new chair of the House Oversight Committee, is more concerned with the Congress's own printing than GPO's activities with the agencies. Librarians are likely to focus on the transition plan for the Federal Depository Library Program.

Discussion

Mack Strauss of the Defense Automated Printing Service indicated that he considers GPO to be a good Contracting Office for printing. However, he would hope for OMB support for independence, so that agencies can make the most cost effective choice.

Mr. Finch described a pilot project between NTIS and the FDLP. User IDs and passwords have been distributed to 22 of the approximately 1,400 FDLP libraries. A variety of different types of libraries are included in the pilot. They will have access to full text online of the 35,000-40,000 documents that NTIS has received in electronic form since October 1987. There has been no subsetting of the information by subject as there is with the paper documents. Currently, the service is free during the pilot period; NTIS will need to see what the volume is and what it will cost. It has been suggested that the cost should be borne by GPO since it has been funded to provide FDLP access. Under the terms of the pilot, NTIS has asked the FDLP libraries to keep the full text within their library building or on the campus. Access to the bibliographic files must be done in the library. However, the user may write down the confirmation number and then re-access the system to download the full text to his own PC.

Discussion

Finally, in response to a question about changes in office personnel, Mr. Weiss noted that Bruce McConnell will spearhead the Y2K efforts on behalf of the President's Council out of an office at the World Bank. The Bank is developing an International Cooperation framework and will be making loans to assist countries in this area.

Scientific Data in the 21st Century - NIST Standard Reference Data and CODATA
John Rumble, Director of Standard Data Reference Program, NIST and International President, CODAT)

The U.S. National Measurement Laboratory was founded in 1901 as part of the National Bureau of Standards. It was renamed in 1988. There are 3,200 employees at sites in Gaithersburg, MD and Boulder, CO. There are four major components to the NIST mission: the Advanced Technology Program shares funding with industry on high-risk research; the Manufacturing Extension Program, modeled after the Agricultural Extension program, provides help to small manufacturers through local centers; the Baldridge National Quality Award Program; and the Measurement and Standards Laboratory. Much of the $450-500M budget goes to measurement activities in electronics, physics, and the technical services. NIST does not create standards, but performs pre-standard research.

Standard Reference Data is a key resource within the sciences. It provides critical evaluation of measurement data, giving a reliable basis for the work of scientists, engineers and the general public. Data users are not experts on how the data are generated, and they do not know the quality of the data that has been published. While the Data Program was established by Congress in 1968, the work actually dates back to the 1920s. The realization of the importance of this type of scientific data came during/after WWI when the U.S. no longer had access to German research data and standards.

The Standard Reference Data Program has subject experts who collect and evaluate data and issue them with quality indicators. The subject experts collect data from the published literature, review and evaluate it, design databases and publications to disseminate this information, and work to disseminate this information widely to industry, academia and government. Key questions in the development of standard reference data are 1) have all relevant factors been controlled (all variables), 2) has the study followed the known laws of nature, 3) how does this result compare to other measurements of the same phenomenon, and 4) does it adhere to the fundamentals of science. Another key activity following the evaluation and the development of the database is to write articles in learned journals concerning the evaluation. This provides a feedback loop to the data generators. The Data Program adds value to the original research results by evaluating data and making them more known and accessible.

While the main information that is stored at NIST is the data, the key papers are also retained as references in the database. They do not yet have the electronic connection between the bibliographic references and the data as they have envisioned but they are working on it. They try to have good metadata for the datasets, but it is only as good as the information in the original material.

The Data Program is made up of long term data centers, primarily at NIST. There are also short term data projects that draw on outside expertise as necessary. Joint projects with industry, national and international groups are also important. The Data program provides coordination to minimize duplication among these efforts. The total NIST data budget is about $15-18M. Support for the program comes from research funds, government funds, outside contracts and in-kind contributions from partners.

The Data Program is moving from print publications to computer databases. The Data Program is currently responsible for the dissemination of over 70 database titles (some subsets have been created for educational purposes and so it is difficult to determine how to actually count the number of databases produced). These databases are updated every 1-3 years. Over 6,000 copies of the databases were distributed last year. Much of the distribution is through third-party distributors. In some cases, these distributors are incorporating the Data Program's evaluated data in their analytical instruments, in self-contained PC packages, as part of online information systems, and as part of larger commercial software packages.

The Data Program has become involved in foreign activities. It has been successful in bringing key datasets from Japan and Russia to the U.S. that previously were not available. Many of these involved data on superconductivity.

The work of the Standard Reference Data Program has been authorized to recover the cost of disseminating and building the databases, but not the cost of the evaluation. NIST also has copyright authorization under 15 USC 290. This covers properties and well defined substances.

Discussion

Dr. Rumble was asked how the Data Program decides what new measurement evaluations to fund. He indicated that emphasis is placed on the maturity of the data, whether NIST has researchers to perform the evaluation, and the impact of the evaluation on the users of the data. Dr. Rumble was asked how much of the data collected is from the U.S. and how much is foreign. Dr. Rumble indicated that they do not really know, but he would estimate that it matches the general trend for scientific publishing between the U.S. and foreign sectors.

There are several new driving forces in science that are resulting in new data activities. Advances in modeling and simulation may decrease the need for experiments, but put more pressure on the data quality. There is an increased ability to examine, measure and control individual atoms and molecules. The Web increases the need for data evaluation. You can find almost anything on the Web, except good information. There is an increasing need for long term datasets.

These driving forces must result in new data activities and changes in the way that the Data Program works. This has been complicated by the fact that there has been no new money over the last several years, resulting in the need to end old activities in order to start new ones. There is more emphasis on materials design, bioinformatics (they are taking over the Brookhaven Labs's Protein Data Bank) and engineering data. Staffing needs have changed and it is difficult to match the private sector salaries for informatics Ph.D's for example. With the introduction of Web dissemination they have seen 100,000 times more use of their products. A big question they are addressing is what their web charging policies should be. Currently, the NIST data is free over the Web. (NIST is currently recovering $2.4M from an expenditure of $18M on data activities.) The politics and economics of federated databases and programs are unclear.

Dr. Rumble also described the activities of CODATA for which he is the new International President. CODATA is an interdisciplinary scientific committee of the International Council of Science (previously the International Council of Scientific Unions). The mission is to improve the quality, reliability, management and accessibility of data of importance in all fields of science and technology. CODATA is involved in scientific data interchange world-wide with representatives from 21 countries and Taiwan, and association with 14 scientific unions. The main emphasis has been on quantitative data and the commonality of handling data across disciplines. Fundamental constants is probably the last such activity. There has been significant activity in the physical, biological, geological and astronomical sciences. CODATA has two major outputs: many regional or national groups are spawned from CODATA discussion and they bring data focused people together across disciplines in international meetings which also result in bilateral and multilateral working relationships.

CODATA is working to reposition itself through a strategic planning process. There will be a new scientific data management journal started which will support the sharing of experiences. A series of millennia data meetings are being planned. The meeting on bioinformatics has been scheduled for the UK, which will address nomenclature problems -- it will focus on the need to link down or up from genomics to species level biodiversity.

The information revolution has also revolutionized the understanding of electronic information and we haven't gotten a handle on this yet. The Cold War dynamic is gone. It is ironic that just as data is becoming more important, the value of international and other data organizations such

as CODATA is being questioned. Dr. Rumble's evaluation of the U.S. position with regard to data management is that by comparison to other countries the U.S. is holding on or getting better. Many other countries' activities are falling off. Funders think you can get everything you need off the Internet. It is really a question, why we aren't investing more in the next generation of data management. It isn't just bits and bytes. The payoff to society is great.

NIST Information Services in the Information Era
Bill Trefzger, Chief of Electronic Information and Publications Program, NIST Office of Information Services

The mission of the Office of Information Services (OIS) is to support and enhance the research activities of the NIST community (including over 1200 visiting researchers each year) through a comprehensive program of knowledge management. OIS has existed in one form or another since 1901. The library and publications groups are uniquely combined. The library has over 200,000 items in its collection, which is open to the public. Within Technology Services there are 49 staff including 8 computer specialists and 17 other professionals including librarians, writers and editors. Technology Services does not charge for its support. They do both electronic publication and provide support for others who choose to do their own.

Strategic planning determined a need to rebuild the group to meet the information era that is already here. As part of re-inventing itself, the Office of Information Services views knowledge as a continuum, between the research activity, the research conclusions , the dissemination of them and continuing research discovery. The aim is to support this process in the most efficient way possible, and in a way that supports the environment of the NIST researchers.

Mr. Trefzger then described various aspects of "what is hot and what is not" within the NIST environment. Highlighted changes include the introduction of a First Point of Contact Desk which is staffed by support staff rather than professionals. This allows the professionals to spend more time on research activities. The Reference Librarians have been realigned as Research Consultants to work more closely with particular research groups and projects. They are developing custom Intranets to support these groups. These Intranets will include dynamic web pages that are constantly updated, replacing bibliographies which were more static in their content. There is increased emphasis on document delivery and on providing full text online via virtual library concepts. NIST has made agreements and pays for access to data that the scientists need. In some cases there are fee for service sources and these are tagged so that the scientists are aware of the cost.

Desktop access is taking the place of mediated searching. User profiles are key and there is a large project to create a user database to support dynamic web design, desktop access customization, etc. Information is currently being pulled from a variety of databases, but there is no customization at the user level. The core information that is presented to the user is based on searches to databases of the presentations and events at NIST, and in the DC area by NIST personnel and to databases of NIST publications. The content will eventually include outside sources and the library catalog.

Discussion

Mr. Molholm raised the privacy issue on the profiles. Mr. Trefzer acknowledged this issue, but indicated that they were not directly addressing it at this time. It has not come up as a problem.

The ultimate key to knowledge management within the organization is attention sharing. This will include the building of systems that allow for re-use and entry by the creator, electronic message boards, e-mail delivery, etc. The creation of an Information Space is more vital than simply saying the information is on the Intranet.

Several technologies and expertise are key to the continued re-use of information. NIST is moving from PDF format to more structured documents with XML. This removes the proprietary nature of the format and should work better into the future.

The group is also involved in the publications process. Rather than redacting information for 2,500 publications per year, including 1,800 for outside journals, they are getting involved earlier in the process and actually doing more writing than editing.

Mr. Trefzger believes that they have a lot of good things going but they are not as far along as they would like to be. They have developed a two year strategic plan with phased implementations. This includes infrastructure development, particularly the replacement of the library catalog. It also includes staff development and realignment. Fifty percent of the staff are new since the summer of 1998.

The evaluation of the program is also key. There is a Quarterly Program Evaluation. It includes not only an evaluation of the usage of library services, but the evaluation of workers and supervisors. It will evaluate the growth in the resources that are made available and the tasks that are being performed.

NASA Systems and Technologies Update
George Roncaglia, NASA Langley Research Center

Roland Ridgeway introduced the changes in NASA STI. He indicated that the program is split among policy, which he has at HQ, and operations under the direction of George Roncaglia at Langley, with a Contractor site in Maryland for production operations.

Mr. Roncaglia put the changes at NASA in the context of declining resources and ever increasing volatility of the environment. The key for the NASA STI Program is to integrate the program with the day-to-day operations of the centers. This fits with the fact that NASA itself is now very heavily matrixed. The products and services are heavily dependent on partnerships with the centers and other organizations, reflecting the fact that the NASA environment is extremely distributed. There are separate funding organizations, each with its own STI program. Of the four delivery systems, only one is solely funded by the STI program.

The NASA Image Exchange system (NIX) is a distributed photographic archive. The umbrella search system that provides cross-database searching is WAIS-based. There are over 500,000 photographs in the system. Video and sound clips are now being added. They are using NIX to test how to provide access to non-traditional science output. For example, the wind tunnel data is in real-time video capture.

The Center for AeroSpace Information facility moved to its new offices in March 1998. The facility has been downsized. A replacement system for RECONplus, called the Aeronautics and Space Access Page is in prototype. It is redesigned to work in a Web environment with lower maintenance costs and platform independence. Small beta testing is being done. They are working with some secure systems within NASA which have provided some ideas for inclusion in this effort. Additional services will be provided with this system. They recently beta tested the Systran machine translation program and are about to release it to all of NASA.

NASA is moving to a full cost-accounting environment. This raises issues about who will be doing STI (how much will it be distributed to the users versus centralized). How much value does a central organization add? The strategy of the STI Program is to cooperate with the computer staff rather than get too competitive. They are able to respond to opportunities more quickly.

One of the main issues for NASA is the proliferation of web pages. There are at least 24 web addresses for significant parts of the NASA operation, and this is likely to increase. The role that the central organization can play is to coordinate these separate activities so that users can be guided to the whole.

Discussion

Mr. Roncaglia was asked about the relationship between the STI Program at LARC and the LARC library. The STI Program at LARC is not with the library. However, the chief librarians still look to the STI program for guidance. Records management is also handled differently across the organization. As the bottom line, Mr. Roncaglia said his strategy is to distribute as fast as possible and align closely with computer operations.