A Parallel Program Execution Model Supporting Modular Software ...

Text-only Preview

A Parallel Program Execution Model Supporting
Modular Software Construction
Jack B. Dennis
Laboratory for Computer Science
Massachusetts Institute of Technology
Cambridge, MA 02139
[email protected]
as a guide for computer system design—follows from basic
requirements for supporting modular software construction.
A watershed is near in the architecture of computer sys-
The fundamental theme of this paper is:
tems. There is overwhelming demand for systems that sup-
port a universal format for computer programs and software

The architecture of computer systems should
components so users may benefit from their use on a wide
reflect the requirements of the structure of pro-
variety of computing platforms. At present this demand is
grams. The programming interface provided
being met by commodity microprocessors together with stan-
should address software engineering issues, in
dard operating system interfaces. However, current systems
particular, the ability to practice the modular
do not offer a standard API (application program interface)
construction of software.
for parallel programming, and the popular interfaces for
parallel computing violate essential principles of modular

The positions taken in this presentation are contrary to
or component-based software construction. Moreover, mi-
much conventional wisdom, so I have included a ques-
croprocessor architecture is reaching the limit of what can
tion/answer dialog at appropriate places to highlight points
be done usefully within the framework of superscalar and
of debate. We start with a discussion of the nature and
VLIW processor models. The next step is to put several
purpose of a program execution model. Our Parallelism
processors (or the equivalent) on a single chip.
Hypothesis is introduced in Section 3. In Section 4 we
This paper presents a set of principles for modular soft-
present a set of six principles a programming environment
ware construction, and descibes a program execution model
should honor to support true modular software construction.
based on functional programming that satisfies the set of
In Section 5 these principles and the parallelism hypothesis
principles. The implications of the pinciples for computer
are used to derive an ideal program execution model, Model
system architecture are discussed together with a sketch
X, step by step. Section 6 comprises a description of a
of the architecture of a multithread processing chip which
computer system architecture inspired by Model X and the
promises to provide efficient execution of parallel compu-
state-of-the-art in computer technology. We conclude with
tations while providing a sound base for modular software
some remarks on the suitability of other models of computa-
tion for supporting modular (or component-based) software
1. Introduction
2. Program execution models
The ideas put forth here are mostly old; they have been de-
Models of computation have several important uses re-
veloped by the author and others over the past thirty years.1
lating to the design of computer systems. A model may
What is new in this presentation is showing how the nature
serve as a pattern for structuring and analyzing programs.
of Model X—the program execution model advocated here
The data parallel model and the “message-passing interface”
1One inspiring event for this work was the conference on Modular
(MPI) are examples. Some models are intended as a basis
Programming put together in 1968 by Larry Constantine [5].
for assessing performance of systems or applications. Here

we are concerned with models that specify the behavior of
in software construction is the procedure call. Execution of
a computer system to the extent that it is relevant to correct
a procedure call supplies a module with a set of input values
execution of application programs. We call these program
and requests that the module use those values in construct-
execution models because they explicitly describe the ac-
ing result values to be made available to the caller. To attain
tions involved in the execution of a program by the specified
the full benefits of modular programming the component
computer system. In a sense a program execution model is
interface supported by the computer system should meet the
a formal specification of the application program interface
following requirements:
(API) of the computer system. it defines the semantics of
the target representation for the compilers of high-level lan-
Information Hiding Principle: The user of a module
guages; and it provides a clear design goal for the computer
must not need to know anything about the internal
system architect.
mechanism of the module to make effective use of it.
Invariant Behavior Principle: The functional be-
3. Parallelism
havior of a module must be independent of the site or
context from which it is invoked.
Parallelism is important for several reasons:
Data Generality Principle: The interface to a mod-
Distributed computing involves parallelism.
ule must be capable of passing any data object an
application may require.
Single thread processor architecture is running out of
Secure Arguments Principle: The interface to a
module must not allow side-effects on arguments sup-
Parallelism is an essential aspect of certain computa-
plied to the interface.
Recursive Construction Principle: A program con-
Nevertheless, parallelism introduces the possibility of
structed from modules must be useable as a compo-
nondeterminate behavior: a software component can pro-
nent in building larger programs or modules.
duce different results on different runs with the same input
System Resource Management Principle:
data when the data arrive asynchronously or there are varia-
source management for program modules must be
tions in the timing of internal events. It is widely appreciated
performed by the computer system and not by indi-
that the possibility of nondeterminate behavior makes com-
vidual program modules.
plex software difficult to design, develop and debug in a
distributed or parallel computing environment. I believe
Some of these principles have been generally appreciated
in the programming principle that nondeterminate behavior
for many years: The Information Hiding principle was ad-
should be avoided whenever it is not essential to the com-
vocated by David Parnas[29]; it is acknowledged in the en-
putation being specified. Furthermore, the semantics of a
capsulation of objects as data representations together with
computation should be the same whether the program is run
sets of operations, as in the abstract data types of the pro-
in a sequential or a parallel/distributed environment.
gramming language ML[27].
These ideas are captured in the following two properties
Appreciation for the Resource Management principle is
of Model X which constitute our Parallelism Hypothesis:
evident in the success of virtual memory and paging, which
relieve the programmer of the (anti-modular) use of mem-
Program modules in Model X must be determinate un-
ory overlays and the consequent ambiguity of addresses.
less nondeterminate behavior is desired and explicitly
Nevertheless, the continued practice of partitioning data
introduced by the programmer.
into “memory” and “files” seriously impedes the practice
Model X must permit the parallel execution of two
of modular programming, and constitutes violation of the
modules whenever there is no data dependence be-
Data Generality principle because files (as opposed to file
tween them that requires sequential operation.
names) cannot be used as arguments or results of a program
On the other hand, violations of some of these principles
4. Principles to support modular software con-
are acclaimed as important features of popular programming
environments. One of these is the Invariant Behavior prin-
ciple. It is often pointed out that the ability of a module to
A primary concept in modular programming is the inter-
have different behavior when invoked in differing contexts
face of a component—the manner in which the component
can be used to provide convenient testing facilities. How-
interacts with its users. The most common kind of interface
ever, this “feature” also permits disasterous false bindings to

serve as the representation of a program module. For Model
X we use a representation of a function from a vector of input

types to a vector of result types. We use this abstraction
instead of the conventional procedure abstraction because
allowing side effects would be in violation of the Secure
Arguments principle.
Q: If you disallow side-effects then how do you
accomodate programs that manipulate “state”?
A: I divide programs that use “state” into two
Figure 1. Illustration of the Secure Arguments
(a) Those programs in which the present
output of a program module may depend
on the entire history of its inputs rather
occur leading to “code rot”. It is better to make all intended
than just the present input. These pro-
behavior of a module selectable by parameter choices and
grams use “history-sensitive” modules but
satisfy the principle of Invariant Behavior.
are still functional if viewed as operating
The Secure Arguments principle is particularly important
on streams of data. They are determi-
for our derivation of Model X. Figure 1 shows two modules
The use of stream data types to
where the arguments of both modules include a shared data
support history-sensitive computations is
structure. According to our Parallelism Hypothesis, it must
discussed in Section 5.7.
be permissible to execute both modules concurrently. If
either module can make any change to the value of the shared
(b) Programs in which a nondetermi-
argument that is observable by the other, nondeterminacy
nate choice is made between alternative
will be exhibited whenever the results of executing a module
courses of action: for example, a reserva-
can be affected by the change.
tion system where several agents compete
Our principles for modular software construction are vi-
for common resources. Support for non-
olated in one way or another by the popular imperative pro-
determinate computations is discussed in
gramming languages. Yet achieving the full benefits of mod-
Section 5.8.
ular programming is not merely a matter of programming
language design because large programs and application
In Model X a function module is represented by an object
packages generally depend on operating system services for
we call a text ; just as a function is a fixed value, a text is not
their correct operation. Using a semantic model that spans
modified during function evaluation.
all functions and services needed by application programs
offers the possibility of making the benefits of modularity
5.2. Invocation of component modules
in software widely available, and encouraging the use of
component-based software in computing practice.
The execution of (or evaluation of) a function generally
calls for the computation of intermediate values which must
5. Derivation of Model X
be held temporarily between steps of evaluation. In Model X
these intermediate values are held in a data structure we call
In the following sections we develop a program execu-
an activation frame or just a frame. A frame exists for each
tion model guided by our six principles for modular software
activation of a function from the moment argument values
construction. Then in Section 6 we discuss a system archi-
are supplied to the time all results have been produced.
tecture based on a multithread chip that could implement
A function may invoke other modules (functions) as re-
the model.
quired by the Recursive Structure principle. By the Par-
allelism Hypothesis, the order in which several indepen-
5.1. Module representation
dent function invocations are made must not be restricted.
Therefore Model X provides for the concurrent evaluation
Model X must include a representation of the program
of functions invoked by a given function. This leads to the
to be executed—a target for a compiler of a high-level user
parallel hierarchical structure of texts and frames illustrated
programming language. First we need an object that will
in Figure 2.

Apply rec.f to (rec, ..)
(function f)
Apply rec.f to (rec, ..)
Tree of function texts
Apply rec.g to (rec, ..)
Tree of activation frames
Figure 2. Parallel trees of texts and frames.
Figure 4. Use of a text structure to access
a group of mutually recursive function mod-

Function Module

Apply lib.math.sin
Library Data
Q: But recursion is so simple: just let the back-
ward reference link to the function text, forming
Apply lib.io.print
a cycle in the graph of texts.
A: In Model X the existence of cycles in data
structures leads to problems we wish to avoid.
Specifically, building a cyclic structure requires
Figure 3. Use of a library of function modules.
either that the cycle be created in a single step,
or that a node in the graph of texts be modified
after it has been created.
Because handling
5.3. Software component libraries and sharing
cycles adds complexity to the model and leads
to inefficient memory management, we prefer
the chosen scheme.
In Model X a library of modules is a data structure hav-
ing the modules as structure components. The structure is
Recursion is the natural way of representing many algo-
passed as an extra argument to each function module that
rithms calling for repeated application of a group of steps.
may make use of library elements, as shown in Figure 3. In
Moreover, tree recursions expose the parallelism of “divide-
this way the binding to library elements is not subject to the
and-conquer” algorithms, as required by the Parallelism Hy-
fortuitous false bindings that may be created by use of user-
pothesis. As the following section shows, there are also good
specified directory paths and search rules, which violate the
reasons to choose simple cases of recursion to model and
Invariant Behavior principle.
express conventional iteration schemes .
Two modules may make use of the same component mod-
ule in their construction. Hence the “tree” of texts is not
really a tree, but a DAG (directed acyclic graph).
5.5. Iteration
5.4. Recursion
Iteration occurs where a sequence of results are com-
puted, each obtained from the previous by the same compu-
The principle that any module should be usable in the
tation. Iterations are naturally expressed as tail recursions,
construction of higher level modules implies the ability of a
so Model X does not include any specific means for repre-
module to make use of itself as a component. Such recursive
senting conventional iteration schemes.
calls are supported in Model X by passing a function as
an argument to itself. As shown in Figure 4 this scheme
Q: Why represent iterations by recursion? It-
generalizes to groups of mutually recursive functions.
eration is, of course, more efficient, in general.

A: For the purposes of this paper it is beneficial
A: Queues are a low-level mechanism that can
to use the simplest model that provides suffi-
be supported using stream data types and I-
cient generality. If necessary, a smart compiler
structure mechanisms (Section 5.7). Other uses
can convert tail recursions into iterations. With
of cyclic structures in lists serve to implement
a hardware design (see Section 6) that makes
various forms of tree structures which are the
memory allocation a trivial step, run-time allo-
basic data structures of Model X.
cation for each iteration cycle (recursive call)
may actually be the more efficient choice be-
Q: But what about programs that operate on
cause it permits easier exploitation of paral-
graphs? Surely graphs in their full generality
need cyclic structures for their representation.
With recursion the tree of activations may grow indefi-
A: Graphs have several representations that do
not use cyclic data structure. One is an array of
elements each representing an arc of the graph
Q: Isn’t a tree much more difficult to allocate
by a pair of integers interpreted as node iden-
memory for than a stack that can grow in a
tifiers. Such representations are often used be-
segment of address space?
cause they are more efficient than representing
A: For general purpose parallel computation
a graph by an isomorphic data structure.
with generality equal to the best sequential pro-
gramming environments, support for an un-
5.7. Streams and incremental data structures
bounded tree of activations is essential. System
architectures must provide efficient support for
A kind of data object having increasing importance is
trees of activations, as has been done in several
an unending sequence of values such as the successive time
experimental systems [28, 33].
samples of an audio signal [14]. The designers of signal pro-
cessing equipment have used the “stream of” abstraction for
5.6. Data structures
many years,2 and there are several specialized programming
tools based on streams.
The Data Generality principle requires that any data ob-
The Sisal functional programming language [26] includes
ject of a computation may be passed as an argument or result
support for programming with stream data types. Figure 5
of a function. For Model X we must choose a general rep-
shows how a typical stream processing module may be writ-
resentation for computation data. To have any chance of
ten in Sisal. This function produces a result stream contain-
permitting an efficient implementation our choice must ac-
ing four elements for each group of three elements in the
knowledge the sharing of data. (A data object may be part of
input stream. Each value in a group of four is defined by in-
the input data of several functions.) Of course, we insist that
terpolation between appropriate values in the input stream.
components of a data object may be processed in parallel
The notation D[i] denotes the ith element of stream D; the
becuase this is a major source of parallelism in computation.
symbol “||” denotes concatenation of streams; the clause
The most important feature shared by data structure types
stream integer [ ...
] denotes a stream con-
is their description of a data object as having components:
stant having the integer elements given within the brack-
records and arrays are familiar examples. Modern program-
ets; the special function Tail (k, D) denotes the stream
ming languages permit data structures to be declared as nests
formed by removing the first k elements of the stream D.
of record and array structures having arbitrary depth. These
Stream processing modules such as FourForThree
data structures are trees. To permit sharing, we enlarge the
can be cascaded by function composition to form networks
class of trees to the class of DAGs. To eliminate any possi-
of modules in producer/consumer relationships, all of which
bility of violating the Secure Arguments principle, Model X
can be operating concurrently in pipeline fashion, as re-
has no update operation on data structures: structures can be
quired by the Parallelism Hypothesis.
constructed from their components, accessed, and released
To support pipelined operation of stream-processing
(by deleting an arc of the heap graph). With only these op-
modules, Model X represents a stream by an incomplete
erations it is impossible to construct a cycle in the heap. We
list structure (or chain of records), as Illustrated in Figure 6.
exclude cycles because creating a cycle involves updating a
A stream element is represented by a “hole” until its value
node, and opens the door to Secure Argument violations.
is filled in by the producer, a function module that defines
a data stream. A consumer function is delayed when it
Q: But list structures are very important data
structure for objects such as queues, and they
2The BLODI block diagram language was in use at Bell Laboratories
typically involve cyclic data structures.
in the early 1960s [23].

ple, when independent jobs enter print requests in the queue
type Signal = stream [integer];
for a printing device. In other cases, the events occur outside
the computer system, as when airline agents independently
function FourForThree (
D: Signal
request bookings for the same flight.4
returns Signal )
guages [1, 21, 31] make this simple. The non-
n1 := D[1];
determinate selection among competing input
n2 := ( D[1] + 3 * D[2] ) / 4;
messages is built into the object model.
n3 := ( D[2] + D[3] ) / 2;
n4 := ( 3 * D[3] + D[4] ) / 4;
A: The problem is that in concurrent O-O lan-
guages you have to use the mechanism whether
stream integer [ n1, n2, n3, n4 ]
you need to or not.
Nondeterminacy is in-
FourForThree ( Tail (3, D) )
troduced as a consequence of concurrency,
end let
whether or not it is desired by the programmer.
Object-oriented languages are generally unsat-
end function
isfactory because of violation of the Secure Ar-
guments principle, and the difficulty of practic-
Figure 5. A rate changer function written in
ing object encapsulation at a depth greater than
one due to inadequate support of the Recursive
Structure principle.
Generally, each case of nondeterminacy in application
code may be viewed as requests arising from function mod-
ules and entering a queue to obtain service. The nondeter-
minacy exists only in entering the requests in the queue. In
Model X each source of requests generates a sequence of re-
quest messages which are formed into a stream and merged
nondeterminately with other request streams.
The basic
operation is the nondeterminate merge (ND-Merge) [10],
which arbitrarily interleaves the elements of two input
streams to yield its output stream.
Q: But once you have introduced a mechanism
Figure 6. Producer and consumer functions
for nondeterminacy, haven’t you opened Pan-
and a data stream.
dora’s box and permitted cycles to be created?
A: As the ND-Merge is used in Model X, it
merely takes elements of given streams and ap-
attempts to access a stream element and finds a hole. It con-
pends them to a single output stream. The inter-
tinues once the hole has been filled [17, 12]. Semantically
leaving of stream elements does not introduce
this model is similar to the I-structures of Arvind, Nikhil,
any new mechanism that could create cycles in
and Pingali[3]3.
the heap.
5.8. Nondeterminate computation
6. Architectural considerations
The need for nondeterminate computation arises when
events can occur asynchronously that require use of a shared
Here we consider the architecture of a highly parallel
resource. In some cases the competing events arise in an
computer system for executing computations expressed in
ongoing computation within the computer system, for exam-
Model X. First we note an important distinction between
two classes of computations based on their need for resource
3We can preclude the formation of cycles that could arise from filling a
hole as follows: Whenever a shell containing a hole is created we require
4It is also true that asynchronously proceeding function activations
that (1) the new shell is placed in a hole of a pre-existing shell node (or
in Model X compete for allocation of system memory.
This involves
a root shell node of the heap); and (2) the new shell does not have any
nondeterminate choice, but this mechanism is part of the system itself, and
structure components except for holes.
not explicit in the application program.

management during execution. Then we discuss some im-
graphical interface code, as well as much of commercial data
portant architectural features that allow the achievement of
processing and many “irregular” scientific applications. The
superior performance as well as supporting modular soft-
class of dynamic computations can benefit immensely from
ware construction. A brief discussion of the organization
the support of Model X for modular software and parallel
of a high performance single chip multithread processor for
Model X computations concludes this section on architec-
6.2. Features to support Model X
6.1. Static and dynamic computations
An attractive computing environment can be created by
designing a system so that the principles of modular soft-
It is helpful to distinguish two classes of computations ac-
ware construction are honored by an architecture in which
cording to how processing and memory resources are man-
resource allocation is flexible and cheap. We suggest that
aged during program execution:
some powerful ingredients for achieving this are:
Class A (Static). These computations require no re-
Use multithreaded processing architecture [16].
source (processor/memory) allocation decisions dur-
Support dynamic use of DAG data structures by a hi-
ing program execution.
erarchical memory system with a fixed size allocation
Class B (Dynamic).
These computations require
dynamic management of resources (load balancing,
Associate a permanent unique identifier with every
memory management) during execution.
data object.
For static computations, the function activation tree in
A multithreaded architecture in which successive instruc-
Model X is effectively fixed and each of its nodes can be
tions executed may be from different threads allows several
statically mapped to the processing resources of the target
important benefits:5
system. The heap can be implemented as a single segment
of address space. Each recursion must be analyzed and
High utilization of functional units, for example, float-
shown to be an iteration (tail recursion); each such iteration
ing point units.
may produce a series of data structure values which can be
assigned to a fixed number of (reused) memory regions. Pro-
Latency tolerance for memory read operations.
ducer/consumer relationships can be identified and modules
Reduction of context-switching overhead.
assigned to processors so as to achieve a balanced load on
processing elements and communication links. This class
Q: Why is it necessary to use permanent unique
of computations is a generalization of the data parallel class
identifiers for all data objects? Can’t the same
which has been successfully used (in the context of HPF [24]
effect be accomplished through emulation or
and Fortran 90) as the basis for programming many scientific
computations for massively parallel computers [30]. The
program analysis required to identify data parallel opportu-
A: Dynamic resource management is required
nities is much easier within the framework of a functional
when data objects of various sizes are created,
programming language such as Sisal, and an extension of
accessed, and released, or their intensity of use
compile-time analysis to stream-based computations in sig-
varies widely during computation. For efficient
nal processing has been developed [15]. Although the class
memory management, it must be possible to use
of static computations can be mapped readily to conventional
any available location for an object without hav-
multiprocessor systems, the benefits of multithread proces-
ing to change the identifier used to reference it.
sor architecture will yield significantly better performance
Maintaining access to an object that is suscep-
for these applications.
tible to being moved is complex and inefficient,
In the class of dynamic computations (Class B) the con-
especially in highly concurrent systems.
trol structures involved in program execution are highly data-
Q: Why is it important to use a fixed size unit
dependent. It is very difficult to code such problems for ef-
of allocation?
fective parallel execution using current systems: The coding
is intricate and error-prone, and mechanisms must be used
5Some of the early dataflow proposals, for example [11] may be viewed
that violate the principles of modular programming. This
as multithreaded processors in which each instruction is a thread by itself.
Several workers soon realized that utilizing some register context (and a no-
class includes such programs as compilers for high-level
tion of short instruction sequences) would yield a more efficient processing
programming languages, symbolic information processing,
architecture, for example [20, 13].

A: When a fixed size unit it used the allocation
of memory for a new data object can be made
Processor Interconnect
a trivial operation. Otherwise, complex data
structures and search algorithms must be used
to keep records of storage state. Use of a fixed
size of memory allocation unit has several im-
portant benefits, especially in the frameworks
of a highly parallel system: Storage manage-
ment operations can be implemented in hard-
ware yielding very inexpensive and efficient al-
location of memory. It is never necessary to
move a data unit for “logical” reasons. (For
performance reasons data units may be dupli-
Memory Interconnect
cated at higher memory levels (caching).) It is
never necessary to change the address at which
a data unit may be found. It supports universal
MTP: Multithread Processor
MU: Memory Unit
Q: What memory model would you implement
to support Model X? How would you ensure
memory consistency?
Figure 7. Highly-parallel computer system to
support Model X.

A: The purpose and function of the hardware
is entirely specified by the program execution
model. The purpose of the hardware is to ex-
ecute programs, not to implement a memory
processor where the function is being executed. There are
model. Regarding memory consistency, it is not
no remote create operations, only remote data structure ac-
necessary for a distributed memory supporting
cess requests. Under these assumptions, four message types
Model X to be consistent.6
are used for interprocessor communication in direct support
of program execution:
6.3. System architecture
1. Initiate a function instance at a specified node with a
given argument (scalar or structure).
These arguments suggest the system architecture in Fig-
ure 7. The overall organization is similar to that of conven-
2. Return the result (scalar or structure) of a function
tional shared-memory multiprocessors. A group of
application to the the site of invocation.
tithread Processors (MTP) communicate with each other
over the Processor Interconnect. The MTPs have associ-
3. Request access to a data structure component at a
ated Main Memory Units (MMU) which intercommunicate
remote node.
through the Memory Interconnect.
4. Return a structure component value to the requestor.
A program to be executed on the proposed system con-
sists of many separate functions. We assume that execution
6.4. The multithread processor chip
of each instantiation of a function is carried out on a single
multithread processor. This is the same policy that has been
The Multithread Processor (MTP) is a single chip that
found to be effective on the Monsoon multiprocessor[28].
contains Instruction Interpreters (II), Functional Units (FU),
In the case of tail-recursive stream-processing functions, it
and Associative Memory Cache modules (AMC), as shown
is likely that successive instantiations of a function will be
in Figure 8. The chip is organized so the instruction inter-
performed by the same processor. We assume further that
preters share use of the FUs and AMCs and high levels of
all data objects are constructed in the local memory of the
utilization are possible.
6Memory consistency models appear to be necessary in conventional
This processor organization is a radical departure from
multiprocessor systems to ensure that the synchronization primitives used
conventional design in that each instruction is independently
for conventional process synchronization work correctly and are ordered
scheduled for execution. Yet the architecture reflects trends
properly with respect to reads and writes of data. In Model X, “synchro-
nization” is combined with data transfer as in the I-structure mechanism
evident in such designs as the “all-cache” architectures, and
for passing stream elements, or the implementation of the ND-Merge.
features of the Monsoon multiprocessor[28]. In particular,

memory allocation.
Processor Interconnect
Q: But conventional microprocessors have the
advantage of high-volume, low-cost produc-
tion, and massive engineering effort.
A: I believe contemporary ASIC fabrication
technology is capable of achieving perfor-
mance levels close to what can be achieved us-
ing massively engineered full-custom design—
sufficiently close that the difference will be
small compared to the architectural advantages
and general benefits from supporting Model X.
The unit cost will be higher initially, but keep in
mind that the processor is just one part of system
cost and the benefits can include improving the
effective utilization of other components such
as memory devices and interconnection struc-
Memory System
I I: Instruction Interpreter
Q: But the industry is not willing to move to a
FU: Functional Unit
new programming model. There is too great an
AMC: Associative Memory Cache
investment in software that only runs on con-
ventional APIs.
Figure 8. Organization of the multithread pro-
A: The industry has reached a turning point.
Soon, moving to parallel computing and multi-
processing on a chip will be the only possibility
for advancing system performance. Available
models for parallel computing are lacking in
each MTP uses a fast Associative Memory Cache (AMC) or-
every dimension: performance, programmabil-
ganized in chunks of about 32 bytes each, each chunk being
ity, generality. Hopefully it will be possible to
identified by a key and accessed using hardware associative
successfully introduce a new and powerful pro-
search. The program execution scheme is designed to take
gramming model in a domain, signal processing
advantage of the fast and efficient memory allocation and
and real-time systems, where the importance of
deallocation that may be implemented in hardware with this
legacy software is relatively low, and the ben-
efits of modular software construction will be
The main memory will also be organized in chunk-sized
blocks. Each data object has its home in a chunk of main
memory, and is brought into an AMC unit automatically
7. Conclusion
when called for by computing activity. A chunk will never
hold parts of unrelated semantic objects, so there will be no
problems of false sharing. The AMCs serve to reduce the
Model X is essentially the Graph/Heap model described
time needed to access heavily referenced objects.
by the author in 1974 [9], which evolved from [7, 8]. The
“unravelling interpreter” model of Arvind [2] has similar
Q: Why not build the proposed system from
characteristics. The benefits of using unique identifiers for
commercial off-the-shelf microprocessors?
data objects and a fixed unit of allocation were argued by the
author in 1968 [7]. The same reasoning was used to justify
use of segmentation together with paging in the architecture
RISC, Superscalar, VLIW) are designed for sin-
of the Multics time-sharing system [4, 6]. The incorporation
gle processor workstation/PC use, and are not
of streams and incomplete data structures in the model stems
optimal for multiprocessor use. They are not la-
from ideas of Ken Weng [17], and our recent proposals for
tency tolerant, and cannot exploit the sharing of
supporting nondeterminate computations [18] have evolved
functional units. Their context switching cost
from [10]. We hope to be able to construct an experimental
is too high, and they do not support efficient
system that implements the model described in this paper in

the near future. We expect that its application programming
[2] Arvind and K. Gostelow and W. Plouffe. The (prelim-
language will be ObjectSisal [18].
inary) Id report: An asynchronous programming lan-
Among the early proposals for modeling formally the
guage and computing machine. Technical Report 114,
complete semantics of a programming language or computer
Department of Information and Computer Science, Uni-
system are the Church lambda calculus and Johnston’s con-
versity of California, Irvine, September 1978.
tour model. The lambda calculus has major significance as
an inspiration to programming language designers through
[3] Arvind and R. S. Nikhil and Kesha Pingali. I-structures:
the work of John McCarthy and Peter Landin, and efforts
data structures for parallel computing. ACM Trans-
of the functional programming community. In contrast to
actions on Programming Languages and Systems 11,
Model X, the lambda calculus provides no direct way of
4:598-632, October 1989.
modeling data structures and sharing, and offers little guid-
[4] A. Bensoussan and C. T. Clingen and R. C. Daley. The
ance regarding the expression of concurrency or nondetermi-
Multics virtual memory. In Proceedings of the Sec-
nate computation. The contour model[22] focussed on ex-
ond Symposium on Operating System Principles, ACM,
plaining the operational semantics of concurrent processes
1969, pages 30–42.
in an Algol-like setting using conventional primitives for cre-
ating and synchronizing processes. The Vienna Definition
[5] Larry Constantine, Editor.
Modular Programming:
Language (VDL)[25] used tree-structures in a formal model
Proceedings of a National Symposium.
for the PL/1 programming language (with numerous viola-
MA: Information and Systems Press, 1968.
tions of our principles for modular software). VDL evolved
to the Vienna Definition Method (VDM) which incorporates
[6] R. C. Daley and Jack B. Dennis. Virtual memory, pro-
some of the basic principles and notation of “Scottery” [32],
cesses and sharing in Multics. Communications of the
and could be used to formally express the operational se-
ACM 11, 5:306–312, May 1968.
mantics of Model X. Linda[19] is a very different kind of
computing model in which independent agents (processes)
[7] Jack B. Dennis. Programming generality, parallelism,
communicate through their shared access to a global “tuple
and computer architecture. In Information Processing
space”. In its original form, Linda is in violation of all six
68. Amsterdam: North-Holland, 1969, pages 484–492.
principles of modular software construction.
There are other important computational models, mostly
[8] Jack B. Dennis. On the design and specification of a
intended as paradigms for analyzing performance issues,
common base language. In Proceedings of the Sympo-
and their formulation does not address the program structure
sium on Computers and Automata, Brooklyn, NY, 1971,
issues considered here.
pages 47–74.
Q: There has been a lot of work on semantic
[9] Jack B. Dennis. First version of a data flow proce-
models for computer programs? Has it had
dure language. In Lecture Notes in Computer Science,
any influence on the architecture of practical
Volume 19: Programming Symposium. B. Robinet, Ed.
computer systems? If not, then why?
Berlin: Springer-Verlag, 1974, pages 362–376.
A: The potential benefits of modular software
[10] Jack B. Dennis. A language design for structured con-
construction have not been recognized. Com-
currency. In Design and Implementation of Program-
plaints about the complexity and cost of soft-
ming Languages. In Lecture Notes in Computer Science,
ware are heard frequently.
The fact is that
No. 54. Berlin: Springer-Verlag, 1977, pages 231–242.
the availability of large memories, paging (vir-
tual memory), and improved programming lan-
[11] Jack B. Dennis. Dataflow supercomputers. IEEE Com-
guages has made programming easy enough
puter 13, 11:48–56, November 1980.
that the genuine practice of modular software
[12] Jack B. Dennis. An operational semantics for a lan-
has not become an important issue. The neces-
guage with early completion data structures. In Lecture
sity of introducing concurrency in application
Notes in Computer Science, Volume 107: Formal De-
software is changing this picture.
scription of Programming Concepts. Berlin: Springer-
Verlag, 1981, pages 260–267.
[13] Jack B. Dennis. The evolution of ‘static’ data-flow
[1] G. Agha.
Concurrent object-oriented programming.
architecture. In J.-L. Gaudiot and L. Bic, editors, Ad-
Communications of the ACM 33, 9:125–141, September
vanced Topics in Data-Flow Computing, chapter 2.
Prentice-Hall, 1991.