|
Network Working Group Request for Comments: 144 NIC: 6729 |
A. Shoshani SDC 30 April 1971 |
The enclosed is an introductory paper for the meeting which will be held in Atlantic City as part of the ARPA Network meetings. The schedule for the meeting will be published soon by Steve Crocker.
The Agenda of the meeting will include:
If you have interest in the subject please plan to attend.
One of the benefits expected from the use of Computer Networks is the sharing of data among users of the system. This paper is an attempt to classify the issues involved, discuss some approaches that might be taken to achieve the goal of facilitating data sharing and to point out some advantages and disadvantages of these approaches.
In the process of selecting an approach one has to consider the following issues:
This approach is consistent with the idea that a Computer Network eventually will evolve into a collection of specialized service nodes, where each node would perform a specific function well. Users will use services on nodes according to their needs. For example, one node could be a PL/I machine (possibly a microprogrammed machine to perform PL/I compilation efficiently), another node could be a "number cruncher" for parallel-structured problems (ILLIAC IV), etc. In the same way there will be a node responsible for all data management needs for the network.
Depending on the assumptions made one of two ways can be chosen:
In this approach a particular data management system is adopted to be implemented on all nodes. This provides for a standardized data management language as well as an identical logical data structures. Alternatively, one can choose a set
of data management systems to be implemented on all nodes, then be able to share information manipulated by the same data management system on different nodes. This approach has many drawbacks as will be discussed later.
This approach suggests the integration of local (to the node) data management systems and local data (files) through the use of appropriate interfaces and a common data management language.
Under this category there may be different approaches depending on the function of the interfaces:
This approach suggest the use of a standard interface which is to be part of every data management system on the Network. The interface has three ends. One to the user language, one to the particular physical system used and one to the Network. The interface should be global enough to permit separation of system decisions from user language decisions. If this interface is standardized on a Network, it will facilitate communication between local data management systems in a unified way, while permitting the development and evolvement of different local data management systems. (This is a rough description of the approach taken by Barry Wesseler in Utah.)
It is well known that the design of a language involves a compromise between the ease of use of the language and its capability to express the functions desired. A try to merge two languages usually results in the worsening of one or both of these considerations.
For the purpose of having a common language for data management it
may be desirable to separate between the above mentioned
considerations. Use natural-language for ease of use, and a formal
intermediate language powerful enough to express any functions
desired. This is the approach taken in the development of CONVERSE
in SDC [1]. The intermediate language can be as complex as one likes
since it is invisible to the user.
Predictions for future use of computers (and therefore computer networks) point out that "in 1975 we will process mostly data" [2]. Therefore, the problem of sharing data on a computer Network, as well as accessing data from remote nodes in some common language are extremely important.
If all that is desired is the sharing of data in a file by more than one user, then the CDMS approach is appropriate. Approach la is impractical, but lb can provide a valuable service. Selecting this approach does not permit the sharing existing data which was created with existing data management system, unless a restructuring of the data for the CDMS is performed. This approach does not easily permit the development of new data management systems since the CDMS should stay stable for the Network use. It does not involve translation of data or languages and therefore should provide good access speed.
The SDMS approach has many drawbacks. Selecting it implies the imposition of a particular data management system on all nodes. It inhibits further development. It does not permit the sharing of existing information. The main advantage would be the modularized structure so that the failure of one node cannot cause the failure of the entire system. Also, because of the standardized approach sharing of data from different nodes does not involve any translation.
The main advantage of the IDMS approach is that it permits the
continued use of existing data management systems with existing data
bases associated with them while permitting the sharing of data among
the network community of users. Since it permits the continued use
of local data management systems it is the most evolutionary approach
and most likely to be accepted by a user of an existing data
management system. There are applications where users on each node
on the Network perform mostly local access of data, and less often
find it desirable to be able to share data with other nodes. For
example, if hospitals are connected to nodes of a Computer Network,
then most of the data about patients is accessed locally, but
sometimes it is necessary to access information from other hospitals,
such as global statistical information. The same situation exists
for criminal files, local branches of banks, credit bureaus,
warehouses, etc. Approach 3a permits the advantages of
modularization, but 3b is easier to implement since no additional
interfaces are necessary in the different nodes. Approach 3c seems
hard to implement and can introduce inefficiencies since it involves
translation from one data structure (which might be designed for
efficiency) to another data structure (which may not be as
sophisticated). It also involves the shipment of large amounts of
data across the network.
The UDMS approach permits the continued development of local systems while facilitating a unified way for Network communication of data requests. It is not clear at this point whether this approach is practical.
Other important issues concerning sharing of data on a Computer Network, and which are mentioned in [3] are overlap of information in different files and the possibility of the same information to be contradictory, security and privacy problems, sponsors of a file vs users of a file, and others.
Discussions with the following people were very valuable: Al Vorhus, Peggy Karp and others in MITRE, Barry Wesseler in Utah, Gerald Levitt, N. Cohen and others in RAND, Clark Weissman, and Charlie Kellogg in SDC, Richard Winter of CCA.
[ This RFC was put into machine readable form for entry ] [ into the online RFC archives by Ryan Kato 6/01]