CENDI PRINCIPALS AND ALTERNATES MEETING
National Science Foundation
Ballston, VA
January 9, 2007
Minutes
Cyberinfrastructure for the Sciences
Cyberinfrastructure and Implications for the Future
NSF Showcase
NSF Supercomputer Centers and NSF’s High Performance Computing Goals
NARA/SDSC (San Diego Supercomputer Center) Partnership
Cyberinfrastructure and Implications for the Future
Dr. Daniel Atkins, Director Office of Cyberinfrastructure, National Science Foundation
A collective vision of cyberinfrastructure (CI) has developed globally over the last four to five years. Europeans started with e-science, or, more generically, e-research, undergirded by e-infrastructure. (“e” can stand for electronic, enhanced, or enabled.) In Asia, it is called Cyberscience. The US uses cyberinfrastructure or CI and talks about science or research as being CI-enhanced or CI-enabled. CI enables research and development into social and technical sciences, and it has implications for technical and social sciences. The Atkins Report had an impact on the humanities that resulted in a recently completed ACLS-Mellon Study on Cyberinfrastructure for the Humanities. CI enables collaboration, which requires new commitments among collaborators. It also enables learning and education resulting in, and simultaneously requiring, new workforce development.
Dr. Atkins believes that the Atkins Report resonated because of the convergence of technologies at the time the report was being written. Scientific fields were beginning to see an increase in complexity and in the multidisciplinary and multi-scale (both time and space) nature of the scientific enterprise. The report served as the basis for approximately 40 workshops on CI and some internal NSF studies and teams focused on how to more specifically respond to the findings. This led to the decision to form the Office of Cyberinfrastructure (OCI), which will build on previous CISE work, under the guidance of an external advisory committee.
The CI vision for 21st century discovery is about to be released. It is more a plateau of the vision than a plan. All NSF directorates and offices support CI in support of integrated research and education and broadened access and participation. CI is a cross-foundational activity with an emphasis on science research and education drivers.
There are several key science drivers that impact the CI vision: the inherent complexity and multi-scale nature of today’s scientific challenges; the requirement for multi-disciplinary, multi-investigator, and multi-institutional approaches that may also be multi-national; and simulations, digital instruments, etc. that increase the data intensity of the environment. In addition, the value of data and the demand for curation and preservation of access has increased. There is a need to exploit infrastructure sharing to achieve better stewardship of research funds. Finally, there is a strategic need to engage more students in high quality, authentic science and engineering education.
Addressing these drivers requires synergy among three types of activities. Transformative applications are needed to enhance discovery and learning. Provisioning includes the creation, deployment, and operation of advanced CI. Finally, R&D is needed to enhance the technical and social effectiveness of the CI. The goal of OCI is to be a catalyst.
The vision space for the new NSF Strategic Plan is between extrapolation, or doing current activities better and faster, and real innovation. The vision space is very ambitious and the question is how to actually achieve it. Rhetoric only succeeds in creating a reality gap. There is an emerging OCI coordination structure to address these issues.
Four major areas form the CI vision framework:1) high performance computing; 2) data, data analysis and visualization; 3) virtual organizations; and 4) learning and workforce development.
High Performance Computing is an increasingly important tool for all the sciences. NSF’s four-year strategy is to increase to teraflops. However, sustained use of the infrastructure will likely require petaFLOPS.
Data, data analysis and visualization are also important. The challenges include increased scale, heterogeneity, and the re-use value of digital data. NSF is taking the initial steps to catalyze the development of a federated, global system of science and engineering data collections that is open, extensible, evolvable, appropriately curated, and long-lived.
The Association of Research Libraries recently produced a report called “To Stand the Test of Time” (http://www.arl.org/info/events/digdatarpt.pdf). Dr. Greer, as the lead NSF Program Officer for Data Initiatives, was involved in this effort.
Virtual Organizations, also known as co-laboratories, grids, network, portals, or virtual research environments, are the third component of the vision framework. The goal is to integrate both physical and CI assets and services and to dramatically relax the issues of space and time. A high emphasis should be placed on allowing people to create their own interfaces and provision of services while providing the underpinnings of secure, efficient, persistent and interoperable services. Physical systems must be richly augmented with virtual organizations that provide end-to-end CI systems for and across all science and engineering fields, both nationally and internationally.Dr. Atkins shared a few examples of Grid Science Gateways, but emphasized that there are many more. The NEESGrid provides real-time access to earthquake shake table experiments at remote sites. BIRN (Biomedical Informatics Research Network) is a portal that provides access to data grid files, computation, and a variety of collaboration tools for biomedical researchers. LEAD predicts and models severe weather conditions. Nanohub supports both research and education in nanotechnology.
NSF also participates in or monitors similar programs internationally. Globus.org facilitates grid projects through the development of similar middleware across national grids. The European Union’s 7th Framework is aimed at putting the best brains and resources together to produce the best science. They believe that to be a genuinely competitive knowledge economy, Europe must be better in producing knowledge through research, diffusing it through education, and applying it through innovation.
The fourth piece of the new strategic plan’s vision framework is Learning and Workforce Development. The goal is to develop a workforce that can use and create CI, and to bring the opportunity for broadened participation to people who have historically been excluded because of physical capabilities, location, or history. CI can support the vision of integrated research and education.
CI brings many new opportunities for collaboration. Realizing this potential will, however, require a new wave of commitment to collaboration among the various stakeholders. This goal is similar to CENDI’s mission, which is based on interagency collaboration.
In the spirit of the CENDI mission, Dr. Atkins asked two strategic questions of the CENDI community. To what extent are knowledge intensive federal agencies making routine, effective use of modern knowledge management tools and emerging Web 2.0 services, processes, and ways of working? Secondly, should there be a greater differentiation between the ‘prescriptive’ IT systems that support the core business functions of an agency and those that support the development and sharing of knowledge and collaboration? The latter refers to barriers to making social software, such as wikis and blogs, available internally for the NSF Project Managers, when these services are easily obtained on the outside, particularly in academic environments.
NetNeutrality was discussed briefly. OCI is aware of the issue, and some partners are participating in the debate. Dr. Atkins believes that the Internet II may isolate science and engineering from this issue but not the larger society. There are more and more collaborations between professionals and amateurs. The best example is in the area of astronomy. Finding new mergers of the two sectors is increasingly important rather than just moving a technology from one sector to another.
NSF Showcase (Dr. Christopher Greer)
NSF Supercomputer Centers and NSF’s High Performance Computing Goals (Steve Meacham)
The High Performance Computing (HPC) Program is a relatively small portion of the NSF budget. It includes procurement, support, and maintenance. It is very visible, though, because HPC, for general science and engineering, is supported by OCI. HPC is augmented through the Geosciences directorate for atmospheric and some ocean science.
The HPC is composed of a spectrum of systems. At the bottom level, there are systems in research groups funded through special programs. Providing more capabilities are university supercomputers, typically acquired with their own or non-NSF funds. The major NSF HPC involvement is in two upper tracks of this spectrum. Track two involves the replacement of mid-range supercomputers with large, powerful systems. Track 1 involves the implementation of longer-range systems (about four-years out) that provide a large amount of memory to solve a range of problems, the provision of leadership resources, and development of new programming environments. This year, they are leveraging the resources to which universities already have access, the biggest challenge being the necessary human resources.
TeraGrid is the delivery mechanism. Its capabilities will continue to expand. They will be tackling increasingly complex questions, including distributed collection of resources and workflows. The TeraGrid provides storage allocations, computer cycles, and human expertise in the areas of user support and consulting. Resources are allocated using peer review and advisory committees.
The TeraGrid offers a common user environment, pooled support, science gateways, modern knowledge management and collaboration software, a portfolio of HP architectures, meta-scheduler and auto resources discovery, and workflow orchestration. It currently supports approximately 20 science gateways and continues to provide increasing access.
A challenge in the near future will be the complexity of the systems. Improved approaches to software engineering are needed. They are working to simplify identity management and software development, and they have other research calls that deal with usability issues. Input/Output is also a big challenge. It isn’t possible to disentangle the “Big Data” from “Big FLOPS”.
NSF welcomes the use of this environment by researchers from other agencies in return for shared support. In FY06, the HPC was used by DOE’s National Energy Technology Laboratory, NARA, and DOE’s Advanced Scientific Computing program. Memoranda of Understanding can be created to establish this relationship.
NARA/SDSC (San Diego Supercomputer Center) Partnership (Robert Chadduck, NARA)
Beginning in 1998, NARA joined NSF and other federal agencies in sponsorship of collaborative research with NSF's supercomputer centers in investigations applying developments in the then emerging computational infrastructure contributing to digital preservation environments. The technical findings and results of this seminal collaborative research with NSF and their supercomputer centers served in 1998, and today continue to serve, as a catalyst for technology advances contributing to CyberInfrastructure. This unprecedented research collaboration among NARA and NSF through these years provides NARA with an opportunity to work with NSF leadership to address underlying problems in computer science and engineering contributing to the shared challenge to manage, preserve and support sustainable access to federal electronic records, e-science data, and other digital information over time.
While NARA recognized that aspects of archives and electronic records management challenges were unique, NARA also understood that many of the underlying technical challenges were also shared by much of the nation's CI and e-science communities. These core shared unsolved problems in computer science and engineering constitute the foundation for continuing collaborative technology research, especially as applied as catalysts for technology innovation that maybe broadly leveraged as CyberInfrastructure.
Joining and continuing in collaboration with NSF enables NARA to effectively receive a very significant return on investment of a relatively small amount of available research funds through both leveraging the much larger NSF and other coordinated NITRD federal investments in the NSF supercomputer centers and in harnessing the pace of developing CyberInfrastructure to support evolving technology testbeds and continuing developments in highly relevant technologies. Additionally, NARA's research collaboration with NSF from its onset, has presented NARA with an unprecedented opportunity pursue research in complex issues responsive to requirements assigned to electronic records collections with findings backed up with sound empirical basis and rigorously testable results.
Recognizing the traditional role of technology research to serve as fact basis for subsequent engineering developments NARA's Electronic Research Archive (ERA) Program is presently engaged in a contemporary engineering translation of earlier research findings and understanding to operations.
NARA's collaborative "kick the tires and drive the car" research approach has enabled NARA, in collaboration with NSF and their supercomputer centers, to contribute to fueling continuing technological innovation contributing to CyberInfrastructure and made it possible to view what was considered to be a highly intractable technical problem in 1998 in a very rigorous tangible way.
NARA's research collaborations with NSF and their supercomputer centers have received, and continue to receive, unprecedented recognition. In 2006 NARA research, in collaboration with the NSF Office of CyberInfrastructure and the San Diego Supercomputer Center, was very privileged to receive a competitively awarded international Internet-2 "IDEA" engineering award for the collaborative "Transcontinental Persistent Archives Prototype". This engineering award, unprecedented for NARA, was one of four competitively awarded Internet-2 consortium presented awards world-wide from over two hundred international source nominated research activities. The citation for this research is: "the Transcontinental Persistent Archives Prototype represents advanced collaborative research. This technology demonstrates how shared knowledge can be managed and distributed across multiple institutions and platforms. The prototype is the Nation's window onto the electronic records archives of the future."
Additional information concerning this award maybe found at: http://www.internet2.edu/idea/2006/transcontinental_persistent_archives_prototypes.html
Additionally, NARA research was also elevated by the Executive Office of the President for formal inclusion in the President's Networking and Information Technology Research and Development (NITRD) program. These awards were all based on the strength of its collaborations including with NSF and the NSF Supercomputer centers.
The fundamental threads of these activities have often extended beyond the original technology focus. NARA's unique affiliation with the San Diego Supercomputer Center (SDSC) has extended into the area of sustainability at an institutional level with corporate planners and university administration. The Board of Regents of the University of California, the SDSC, NARA, and NSF are collaborating on shared learning and research contributing to a "fabric of relationships" enabled by CyberInfrastrcuture towards the preservation and sustained access to scientific data collections over time.