Data Flow languages and programming

Prev: simple: P versus NP
Next: 19990823: General announcements. Goedel's question: if a theorem has a proof of length at most n, can we find it in time O(n^2)? Another question on what can be computed in limited time and space.

From: Joshua Cranmer on 27 Oct 2009 10:09

On 10/26/2009 06:22 PM, Richard Harter wrote:
> The antithesis is that it isn't turtles all the way up, or at
> least it shouldn't be turtles all the way up. That is, the kinds
> of languages and programming technologies that we use in
> large scale programming should be quite different from the kind
> of languages used for programming in the small.

I think most people would agree that methodologies for small-scale
programs are different from those for large-scale programs, although
there might be considerable disagreement over the dividing line. I
certainly am not going to be used a full-blown OOP paradigm for writing
even something like an automatic code generator; on the other hand, I
shudder at the thought of writing something so complex as an operating
system in a functional paradigm.

An interesting project would be to take a large corpus of various mature
open-source programs of varying size and see if there is a correlation
between size and the use of certain language features or patterns.

> One difference between traditional programs and data flow
> programs is that traditional programs use LIFO semantics whereas
> data flow programs use FIFO semantics. That is, a traditional
> program puts data on a stack and gets data back on the same
> stack. In data flow programs each element gets data from a queue
> and puts data to other queues.

One nit to point out: at this stage, many programs don't follow a
procedural model of programming. OOP is the dominant paradigm, and I
don't see the sequence of data flow as being LIFO. Yet you continually
refer to procedural programming as those that make up `traditional
programs.'

> Another difference is that the connectivity of traditional
> programs is deeply embedded in the code. To pass data from A to
> B, A calls B. That is, the caller has to specify where the data
> goes. In data flow programs the connectivity can be separate
> from the code. A doesn't pass data directly to B; instead it
> passes data to the run time system which in turn passes the data
> to B. The caller does not have to specify where the data goes.

Two words: function pointers. You can also go with virtual or dynamic
method dispatch, but function pointers is shorter and sounds better.

> * Concurrency and parallelism are natural. Code can be
> distributed between cores and even across networks. Many of the
> problems associated with threads disappear.

They don't disappear, they're pushed into the runtime system. From
practical experience, that means they probably bubble up and annoy you
in the programming language.

> * Data flow networks are natural for representing process. This
> is particularly true for transaction processing.

Maybe I'm just being a stupid idiot here, but I don't see how data flow
is natural for representing some common processes. For example, the
HTML5 tokenization process. I suppose you can flow current-state output
back around and into itself as next-state input, but that's not exactly
natural.

This also brings up a question of how the language deals with loops in
the dependency graph. The solutions I see either bring back the problems
with threading or could create unreliability.

> * Message passing gets rid of the problems associated with shared
> memory and locks.

Just as credit default swaps and other financial machinations got rid of
risk :-). From what I can tell, similar problems arise, but they're in a
different form (data flow dependency graphs, particularly the fact that
they're typically not acyclic).

> * Data flow programs are more extensible than traditional
> programs. Elements can be readily composed into composite
> elements from the bottom up rather than top down.

Even if you consider good old procedural programming, I would say that
the exact same thing is easily doable in traditional programs. I can
take code that does asynchronous socket reads and turn it into a MIME
decoder by creating a module that calls the asynchronous socket read
module appropriately. It's also an example of your "bottom-up composition."

> Some significant disadvantages:
>
> * Using data flow programming in the large pretty much requires
> that it be used from the start. That is, converting traditional
> programs into data flow programs is difficult because the
> structuring is so different.

I don't see it that way. Ultimately, most libraries are merely "pass X
in as input, get Y out as output"--it shouldn't be too hard to make that
an atomic data-flow program block.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

From: gremnebulin on 27 Oct 2009 13:46

On 26 Oct, 22:22, c...(a)tiac.net (Richard Harter) wrote:
> SHOULD IT BE TURTLES ALL THE WAY UP?
>
> In the famous anecdote, the little old lady replies to the noted
> philosopher, "It's turtles all the way down." When it comes to
> writing software many writers on software design and many
> programming language creators seem to believe that it is turtles
> all the way up.
>
> What do I mean by "turtles all the way up"? By this I mean the
> thesis that the techniques and programming language concepts that
>
> are used in the small can be extended indefinitely to programming
>
> in the large. In other words if we use language X for 100 line
> programs and 1000 line programs we can also use it for 1000000
> line programs. We may have to add some extensions and new
> features along the way, but we can increase the size of the
> programs we write indefinitely using the same ideas.
> <p>
> The antithesis is that it isn't turtles all the way up, or at
> least it shouldn't be turtles all the way up. That is, the kinds
>
> of languages and programming technologies that we use in
> large scale programming should be quite different from the kind
> of languages used for programming in the small.
> <p>
> There are many propositions as to what those large scale
> technologies should be, and many such technologies in use. Here
> I am going to look at data flow languages and data flow
> structuring of programs.
>
> WHAT ARE DATA FLOW LANGUAGES?
>
> There are two wikipedia articles that give useful answers:
> http://en.wikipedia.org/wiki/Dataflow_programming
> http://en.wikipedia.org/wiki/Flow-based_programming
>
> The distinction wikipedia makes between data flow programming and
>
> flow based programming is obscure. The following definition is
> an edited version of definitions used in the two articles.
>
> Data flow languages structure applications as networks of
> "black box" elements that exchange data across predefined
> connections by message passing. Elements execute when they
> get messages; they send messages asynchronously. Data flow
> based applications are inherently parallel.
>
> There are a wide variety of data flow languages, varying from
> spread sheets, to Labview, to Erlang. Many are graphical;
> programming is done by altering flow diagrams. One thing they
> all have in common is that they have a run time system.
>
> Traditional imperative programs are composed of routines that
> call each other; i.e., when a call is made the caller constructs
> a data packet (calling sequence) and transfers control and the
> data packer to the called routine. When the called routine is
> done it constructs a data packet to pass back to the caller and
> transfers control back to the caller.
>
> In data flow programs the "routines" do not call each other.
> Instead they are activated by the run time system when there is
> input for them; when they create outputs the run time system
> takes care of moving the output to the destination input areas.
> When the "routines" are done they transfer control back to the
> run time system.
>
> One difference between traditional programs and data flow
> programs is that traditional programs use LIFO semantics whereas
> data flow programs use FIFO semantics. That is, a traditional
> program puts data on a stack and gets data back on the same
> stack. In data flow programs each element gets data from a queue
> and puts data to other queues.

Hmm. Trad languages aren't stack-based in the sense
FORTH is, with explicit PUSH and POP instructions.
Parameters and return values sit on the stack for
implementational convenience. You could re-implement
a trad language with mini-queues that are only one message long.
(Thus making the FIFO/LIFO distinction irrelevant). Indeed,
OO languages oftens see method calls as message passing.
(http://c2.com/cgi/wiki?AlanKaysDefinitionOfObjectOriented)

The stacking seems to come in with a rule that
replies/returns are always to the sender/caller. Such a rule would,
I think, lead to emergent stack-like behaviour even without
a low-level stack in the implementation. It is interesting to consider
how recursion could be implemented in a dataflow language.

> * Using data flow programming in the large pretty much requires
> that it be used from the start. That is, converting traditional
> programs into data flow programs is difficult because the
> structuring is so different.

Counterexample:

ls | sort

From: gremnebulin on 27 Oct 2009 13:54

On 27 Oct, 00:04, user923005 <dcor...(a)connx.com> wrote:
> On Oct 26, 3:22 pm, c...(a)tiac.net (Richard Harter) wrote:
>
>
>
> > SHOULD IT BE TURTLES ALL THE WAY UP?
>
> > In the famous anecdote, the little old lady replies to the noted
> > philosopher, "It's turtles all the way down." When it comes to
> > writing software many writers on software design and many
> > programming language creators seem to believe that it is turtles
> > all the way up.
>
> > What do I mean by "turtles all the way up"? By this I mean the
> > thesis that the techniques and programming language concepts that
>
> > are used in the small can be extended indefinitely to programming
>
> > in the large. In other words if we use language X for 100 line
> > programs and 1000 line programs we can also use it for 1000000
> > line programs. We may have to add some extensions and new
> > features along the way, but we can increase the size of the
> > programs we write indefinitely using the same ideas.
>
> I've never seen a convincing argument to show that this is wrong.
>
> We can use a 26 letter alphabet to make little words.
> We can use a 26 letter alphabet to make bigger words.
> We can use a 26 letter alphabet to make little paragraphs.
> We can use a 26 letter alphabet to make bigger paragraphs.
> We can use a 26 letter alphabet to make little books.
> We can use a 26 letter alphabet to make bigger books.
> We can use a 26 letter alphabet to make entire libraries.
>
> Why isn't the same thing true of programming languages?

what do you make libraries into?

At each stage in your
list some extra organising structure is introduced. A library
has shelves and indexes, it is not a single gigantic book or trillion-
word sentence.

So the answer to "why not" is "we have reached the highest organising-
pricniple
allowed by the language, and we still have more to organise". (what do
you make libraries into? )

Currently we are struggling to cope with applications spread accross
multiple nodes and processores, a situation which was not forseen
in most traditonal HLLs.

From: Ray on 27 Oct 2009 13:50

Richard Harter wrote:

> There are many propositions as to what those large scale
> technologies should be, and many such technologies in use. Here
> I am going to look at data flow languages and data flow
> structuring of programs.

I think dataflow programming deserves some more exploration than
it's had. Right now, piping things between shell commands is all
the dataflow programming most of us do, and that's pretty trivial.
Also, the languages involved are pretty heavily constrained by
what they do and the very limited uses to which they are put.
The dataflow parts of shell programming, for example, are
effectively monotyped, with the datatype they work on being the
"line of text."

We haven't really seen examples of "modern" dataflow
languages (with, eg, user-defined types, good tools, proper
debugging interfaces, an asynchronous execution model,
queue-length and queued-data age monitoring, function types,
encapsulation, etc).

It's worth a research project or two, certainly. The semantic
primitives we use to describe type systems, for example, take on
different meanings, whose theorems and corollaries have not yet
been fully examined, in the absence of function calls as we
understand them in OO, functional, and imperative languages.

But I think I need to take exception to some of your claims.

> * Data flow networks are natural for representing process. This
> is particularly true for transaction processing.

And particularly untrue for naturally-recursive processes such as
formal language parsing, etc. Also? The reason they're very
natural for transaction processing is because they are BUILT ON
TOP OF transaction processing systems. All those operations on
queues are data commits that the runtime has to manage. A typical
dataflow language is mostly a way of customizing the transaction
processing system on which it's built to the needs of a particular
application.

> * Message passing gets rid of the problems associated with shared
> memory and locks.

Sort of. In practice the runtime has to manage the queues, and it
has to mediate contention between processes that want to write to
them and between processes that want to read them. This may allow
you to not notice the locks etc, but it doesn't remove them. On
the other hand, good dataflow design can often limit queues to a
single producer and a single consumer.

> * Data flow programs are more extensible than traditional
> programs. Elements can be readily composed into composite
> elements from the bottom up rather than top down.

Got any experimental results? Frankly I don't see one as
being superior to the other, and I don't see how "bottom-up"
or "top-down" designs are more or less favored by either
paradigm. If you're going to make a claim like this, you
should do the study to try and prove it.

"Top-down" vs. "Bottom-up" IME are usually more influenced
by programming tools (like, eg, a REPL or otherwise
interactive environment as opposed to a static compiler).

Bear

From: gremnebulin on 27 Oct 2009 13:57

On 27 Oct, 08:05, Mok-Kong Shen <mok-kong.s...(a)t-online.de> wrote:

> To be able to naturally deal with concurrency and parallelism
> is excellent. But wouldn't it be, on the other hand, somewhat
> inconvenient to program processes that requires sequentiality?
> Should or could there be an adequate interface between data flow
> and conventional programming languages?

The transforms that operate on flows are black boxes
and could be written in PP, OO, or functional languages.
The DF system just has to know what they can handle.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Prev: simple: P versus NP
Next: 19990823: General announcements. Goedel's question: if a theorem has a proof of length at most n, can we find it in time O(n^2)? Another question on what can be computed in limited time and space.

Data Flow languages and programming - Part I