CENDI PRINCIPALS AND ALTERNATES MEETING

National Library of Medicine
Bethesda, MD
June 24, 1997

Minutes

Hot Topics and Issues
CIC and G-7 Updates
PUBMED: Interface and Retrieval System
WIPO and Database Protection
Revision of Title 44

WELCOME

Tom Pedtke, Chair, began the meeting at 9:10 a.m. He thanked NLM for hosting the meeting. Introductions were made.

Hot Topics and Issues
Dr. Brian Kahin, Consultant/Information Infrastructure, Office of Science and Technology Policy

Dr. Kahin is a full time consultant to the Office of Science and Technology Policy (OSTP) http://www.whitehouse.gov/WH/EOP/OSTP/html/OSTP_Home.html. He will take on a full time position with OSTP in September. Previously, Dr. Kahin was with Harvard's Kennedy School where he worked on the issues of the Internet and its impact. At OSTP he will be covering areas related to the information infrastructure. He will not spend much time on telecommunications policy, because others at the White House are handling telecommunications policy. One initial effort is to identify where the information infrastructure activities within OSTP are headed over the next two years.

In looking back after four years of the Clinton Administration, one sees a devolution back to twin roots: telecommunications regulatory policy and strategic technology investment. The telecommunications policy component is moving toward international electronic commerce policy. Ira Magaziner's group is overseeing this; the report will be released formally in early July [it was released on July 1].

Laurie Perrine at OSTP is handling issues of strategic technology investment, including the next generation Internet.

Dr. Kahin is involved in areas of competition and private sector/government interaction. Under these broad issues, Dr. Kahin is specifically addressing issues of domain names and network numbers, metadata (labeling), and copyright.

Internet Domain Names

About 60 percent of his time has been spent on domain names. He is the co-chair (with Bruce McConnell) of the interagency working group established as a result of the Magaziner effort.

A proposal has been developed by a group of two international organizations, the World Intellectual Property Organization (WIPO) and the International Telecommunications Union (ITU), and two non-governmental organizations, the Internet Society and the International Trademark Association. Their authority to promote the regulation of domain name changes has been questioned as have some of the recommendations. The proposal is to set up non-exclusive registries. There are seven new domains proposed with more possible in the future. Examples include ".firm", ".info", ".rec", ".nom", etc. Since there is no limit on the number of registries, the organizations must meet financial thresholds to become a registrar. The proposal includes an arbitrated appeal process where the use of second order names by more than one entity can be resolved without law suits.

Some groups are trying to add new domain names independently. However, unless the root servers pick up on the new domains, the new domain will not be disseminated throughout the Internet infrastructure. These sites may only be accessible to .5 percent of the Internet.

In terms of the domain issues, Dr. Kahin is trying to live up to the philosophy of Magaziner's draft report:

1) Government should play a restrained role; any regulatory regime should be simple, transparent and consistent.

2) The Internet is unique; we can't assume that the old ways will work.

3) An international approach must be taken.

However, the private sector has not developed a consensus on the problem of domain names. The proposal from the ITU (et al noted above) had an absence of public process that made it very controversial. There is no consensus that changes must be made to top level domains. Although there considerable interest in new domains as way of eliminating the shortage in .com and as alternative to Network Solutions' perceived monopoly on .com, .net, .org

The current environment has created the expectation of proprietary rights for second-level domain names (i.e., those under .com). Some have argued for random assignment of domains within .com, but this is unlikely to happen now.

As work on these issues proceeds, the constituencies are expanding. A Notice of Inquiry has been put out through the Commerce Department asking for public input. As more voices are heard, it seems to only complicate the issue.

The government is still playing a major administrative role, but it is trying to divorce itself. The need for a transition plan is being addressed. OSTP is working with NTIA to get policy development agency to carry the domain name and network numbers forward.

Another issue is that a monopoly for ".com" may have been granted to Network Solutions. A $50 fee must be paid to Network Solutions every year for registration of a .com name. Until the cooperative agreement between NSF and Network Solutions expires, 30% of the receipts must go into an "intellectual infrastructure" fund. This amounts to about $30 million now and will grow to $50M by the time the cooperative agreement expires in March. Some have suggested that this money be used to fund the Next Generation Internet.

Metadata and Labels

Dr. Kahin's time is also spent working on the infrastructure for metadata. This has a number of policy issues. Related to the Communications Decency Act, he is looking at the labeling of WWW sites for various content characteristics such as violence, sex, etc. Platform for Internet Selection (PICS) is the overall architecture for this because it allows for an infinite variety of rating systems. Labels related to the content have not been widely implemented on the WWW. At present, only a small number of sites have labels. This limits the ability of parents to limit their children to labeled sites. The White House is considering requiring federal sites to at least use a "government" label.

PICS and other technologies can also be used to manage privacy expectations and Intellectual property. The Open Privacy Standard (being developed by Netscape) allows users to set their own privacy standard on their own browsers. The Privacy Preference Platform is a web site standard for privacy.

 

Also related is the use of metadata for electronic contracting and enabling access to information through indexing and searching.

OSTP is interested in bringing all the pieces together to deal with the metadata as an infrastructure issue. Dr. Kahin is giving a presentation to the GITS Board in July, which will look at both labeling of federal web sites for content and the larger issues.

Copyright and Intellectual Property

Dr. Kahin has not been involved in many intellectual property issues to date. It is unlikely that he will be involved in the WIPO issues until treaty implementing legislation is forwarded from the Commerce Department. He may attend the committee of experts meeting in September.

Software patents are another intellectual property issue. Since there are special penalties for willful infringement of software patents, patent attorneys discourage developers from searching patent databases for fear they will find that they are infringing something. The software patents issue may be the subject of a study by the Critical Technology Institute that could help better explain why patents have been problematic in the software industry.

CIC and G-7 Updates
Dr. Donald Lindberg, Director, National Library of Medicine

Dr. Lindberg is the U.S. representative/national coordinator to the G-7 Global Healthcare Applications Project. The G-7 is a consortium of the seven wealthiest nations in the world (now a committee of eight with the addition of Russia at the latest meeting). The European Union (EU) also has a separate representative. The Secretariat is staffed by the EU. There is no single chair of the Project or its subprojects but, at the subproject level, a lead country is identified. Each country has one national coordinator.

Dr. Lindberg reported that since his last briefing to CENDI on G-7, the Global Healthcare Applications Project held a meeting. The group eliminated completed projects from its list and added others. Dr. Lindberg distributed a list of the remaining subprojects. Two projects proposed by the U.S. were approved by the G-7. NLM proposed that Phase 2 of the Visible Human project be broadened to a multi-language anatomical digital database. This involves the definition of words to label the data and objects in the anatomical data and image sets for the Visible Man and Visible Woman. Linguists will be brought together to define the terms in multiple languages.

The second project is an extension of Fred Wood's study on the use of the Internet and the purported problems with connectivity speed. The original study, which provided good data on actual delays and various-sized packets, revealed that the delays are locally caused. For example, the speed decreased in a particular region of the country when that region came online, not when the U.S. came online. This effort was added to an existing subproject on enabling mechanisms for the Global Healthcare Applications project, and will seek to involve oversees collaborators to participate in this study.

NLM also discussed its participation in an NIH project to set up a network of research centers/laboratories in Mali related to malaria in Africa. This project was not actually proposed at the G-7 meeting, but is likely to move forward. Ultimately, up to two dozen sites may be linked to share research and public health information.

The Committee on Computing, Information and Communication (CIC) of the National Science and Technology Council (NSTC) was also described. While there have been changes in membership, the Advisory Council has finally been established. Dr. Lindberg reported that this group should be quite active. There are a number of good people on this group, including one librarian and people from many disciplines. With the Advisory Council there is finally the ability to include industry.

The High Performance Computing Center (HPCC) authorization will end at the end of 1997. However, the administration does not believe that it needs authorization for its continuance. The priority action here is with the Next Generation Internet (NGI). Applications and needs are pushing the technology that must then cycle back.

Dr. Lindberg recommended that the CENDI members read an NRC "Report For the Record: Protecting Electronic Health Information", published by the National Academy Press (1997). The research was partially funded by NIH. While it addresses privacy issues specifically for medicine, it has many broader applications.

PUBMED: Interface and Retrieval System
Dr. David Lipman, Acting Director, National Center for Biotechnology Information

The National Center for Biotechnology Information (NCBI) http://www.ncbi.nlm.nih.gov/ is charged with collecting, storing and maintaining molecular and structure data. The largest is the GenBank database of genetic sequences. Approximately 30,000+ IP addresses per day access the NCBI site. This is an estimated 500-600 queries. The PubMed system was developed by the NCBI to integrate biotechnology journal information with other information.

PubMed will be publicly announced in the near future. The URL is http://www.ncbi.nlm.nih.gov/PubMed. Included in the PubMed system are numerous databases created and sponsored by NLM and NCBI, including genetic maps, taxonomic databases, and full text databases like "Mendelian Inheritance in Man@.

Online journals wanted to link their full text into the genetic databases. NLM suggested a connection to the references as well. In order to be included in the project, the publisher must submit structured SGML(standard generalized mark-up language)- tagged header records in NLM's format. The full text of the journals continues to reside on the publishers' system. The high impact biology journals are already connected. Another 200 smaller journals are expected to be added to PubMed over the next six months.

There are many different payment models used by the publishers. This type of architecture, where the payment is in the hands of the copyright owners, makes different payment models possible. Some have a pay-per-view model, others are free, and others require a subscription (and therefore require a password) before allowing access.

Dr. Lipman described the search engine used with PubMed. It is was developed internally, based on statistical text similarity algorithms developed by NCBI. NCBI originally intended to procure a commercial-off-the-shelf product. However, when the commercial-off-the-shelf products proved to be very "thin" and required significant customization to suit NCBI's needs. This made a commercial product more costly than developing a search engine internally.

NCBI also determined that basing searches on related records (find me another one like this one) was easier for the end-user to understand. However, future versions of the engine will provide additional Boolean search capabilities. There does not appear to be a scaleability problem. The system handles 12-15 intellectual queries/second.

The user submits a query and a list of citations is presented in ranked order. Along with the citation information, an indication of whether the full text or genetic information is linked to that site is presented via link buttons. If the user selected the genetic link, the sequence structure or other relevant information is presented. If the user selects the full text link, the user is linked to the publisher's site. The publisher's site may present the full text for that article, depending on the publisher's payment requirements and the status of that user. Alternatively, the publisher provides information about how to obtain the article via document delivery or how to subscribe to the journal.

The system has three components: a relational database to track the flow from the publishers, "neighbors", a batch-oriented system to determine related citations now being moved to an interactive environment, and the search system itself. The system is based on the ASM.1 data format. NCBI had a lot of data compliant to this format. (Chemical Abstracts Service also uses many of the tools for ASM.) The general indexing software creates the indexes based on the ASM data definition.

There is software and hardware redundancy with one IP address. Machine to machine communication is used across multi-platforms. Tools are provided to the publishers to match MEDLINE citations to their full text archives.

Dr. Lipman indicated that while electronic publication can be done for one-quarter to one-half the price of paper, cost control is still an issue. Historically, databases and publications were purchased by libraries. They acted as cost control centers and as collection mechanisms. How will this work in the new model? Who will help the parent organization to control the costs?

Dr. Lipman sees the PubMed system as a collection development organization. PubMed will bring together quality sites and make them available from the mass of sites that the users would normally have to traverse.

Discussion

The types of journals available in electronic form and how they equate to paper versions were discussed. Dr. Lipman indicated that many of the electronic journals do not have paper equivalents. Some, like Pediatrics, have papers that appear in the electronic journal only and papers that appear in both. Some journals use the SICI (Standard Individual Contribution Identifier) as the unique identifier, while others use the DOI (Digital Object Identifier).

WIPO and Database Protection
Dr. Harold Schoolman, Deputy Director, National Library of Medicine

There are four important events that must be recalled in order to understand the history of the database protection treaty: 1) The Supreme Court decision on Feist that ruled that the sweat of the brow no longer justifies ownership, 2) the European Community (EC) Directive on database protection, 3) the World Intellectual Property Organization (WIPO) Treaty draft, and 4) the Patent and Trademark Office (PTO) White Paper on Intellectual Property in the Information Age.

The WIPO Treaty was never considered at the meeting in Geneva in December 1996. In September 1997, a Committee of Experts will meet to identify issues to be considered by treaty authors. The U.S. representatives to the meeting have not been announced. Essentially, the process is starting all over. This provides one to two years of time for input from the various stakeholders.

The scene then shifts to Congress. Legislation is likely to be introduced this year

The two Congressmen likely to present legislation are Howard Coble in the House and Orrin Hatch in the Senate. Earlier this year, the Register of Copyrights was asked to prepare a document of fact to state issues and opinions, but not to make recommendations. The document is scheduled for release in July.

Dr. Schoolman identified several key issues:

Is there a need for additional database protection? The language of WIPO is designed to protect investment. Isn't the lead time of the originator creating a market advantage of its own?

Suppose there is a need for database protection, the language should protect against piracy and creation of an electronic product. However the language is so broad and it grants control over data itself, not just the expression of it. Copyright only gives the right of expression, but database protection broadens this to control over the data itself. "Insubstantial" is not defined. If the authors are really trying to protect against piracy, then the language should be written specifically for that purpose. The usual language for fair use should be followed.

Two additional copyright laws were in the original legislation introduced in the last session. The first makes it a felony to create an instrument whose sole purpose is to break the privacy and security of the material. However, every fair use of the material would require de-encryption software or hardware. Every felonious use has a good use as well, which puts the burden of proof on the fair user. This legislation is unenforceable and potentially dangerous.

The second was an amendment to make it against the law to alter copyright management material/information. This is a beneficial amendment. However, it should be amended further to make it a felony to copyright material when it is not valid to do so, as with government material or material for which the copyright has expired. For example, publishers often claim copyrights on the compendium but then also copyright the individual articles.

A side issue, but equally important, is the privacy issue resulting from the information that publishers and others can keep about what individuals are accessing.

Revision of Title 44
Eric Peterson, Staff Director, Joint Committee on Printing

The House Oversight and Senate Rules committees govern the procurement of printing and dissemination services through the Government Printing Office (GPO) and the Superintendent of Documents. The last rewrite of the printing act occurred in 1895. Numerous attempts to rewrite Title 44 have been made over the last twenty years.

Hearings were held last year, and then the Joint Committee drafted a no-authored bill to get the ball rolling. Hearings have been completed and everyone has been contacted to perfect the draft proposal.

Tax payers expect government information to be available without additional unwarranted fees. The major issue being addressed is non-compliance with Title 44. The requirement is two-fold. The printing services must be procured through GPO, and the material must be deposited with the Federal Depository Library Program (FDLP) regardless of form and format. Of course, there are copyright-like restrictions including issues of collaboration between the government and the private sector, inclusion of copyrighted information within a government document, and the licensing of CD search engines used with government data.

This draft wording includes penalties of dismissal and a $5000 fine for a federal employee who gives away copyright of a work based on government research. However, they are now working on revised wording with a different type of compliance mechanism that has both a carrot and a stick.

Mr. Peterson reviewed the major changes proposed to Title 44. Chapter 1 includes the constitutional separation of power question which results in the transfer of the responsibility to the public printer or the Superintendent of Documents. Chapter 5 includes requirements of the agencies to use GPO. Chapter 19 includes changes to the Superintendent of Documents authority and the FDLP. Chapters 3, 5, 11 and 17 will have archaic language changed. Definitions are contained in Chapter 19, that are based on the Paper Work Reduction Act, and Chapter 35. These chapters are being discussed but changes have not been finalized. Mr. Peterson believes they will settle on the word "public information" and that it will fall to the agencies to decide what to release to the public.

A discussions followed in which CENDI members gave their feedback on the draft wording presented and also raised related issues not addressed by Title 44 directly.