Prev: NY Times
Next: complex symmetric matrices
From: George Neuner on 14 Dec 2009 13:00 On Fri, 11 Dec 2009 07:24:53 -0500, Kenneth Tilton <kentilton(a)gmail.com> wrote: >George Neuner wrote: > >> ... I don't consider [Kenny's] favorite analogy - spreadsheets - to be >> representative of dataflow. > >Actually Mark Guliano (sp?) the COSI presenter at LUGM 99 preferred the >same analogy. Where do you see it breaking down? I try to avoid it >because almost everyone misses that it is an analogy and says something >like, "Hunh? I do not want to write Lotus 1-2-3!" YMMV, but IMO the defining feature of dataflow is that it is a feed-forward broadcast/multicast model - the goal of the original hardware model was to implicitly sequence computations so as to advance them as their input became ready rather than as explicitly directed by the programmer, to enable efficient use of parallel hardware and, in particular, to enable speculative execution whenever possible. I don't believe any software model can claim really to be dataflow unless it addresses these original goals. I concede that it is reasonable to view dataflow as a network of active observers rather than a network of active providers, but if you take the position that all the players are active agents, I believe that the observation model is somewhat less efficient when there are many linkages and that it makes more sense to have agents be activated/scheduled upon receipt of data. The applicable hardware analogy is "asynchronous rendezvous logic" which is what the original model was based on. As for my issue with spreadsheets - their working have traditionally been explained in terms of the observer model (though obviously they could be implemented either way - spreadsheet is not a model itself but an example). In any event, most people understand them in terms of observation and, because of that, I believe the analogy is, at best, misleading because it does not (I think adequately) illustrate the feed forward nature of the computation. >> I do consider that Cells itself is in the >> spirit of dataflow although, when I tried it, my opinion was that the >> programmer has to do too much manual fussing with the linkage web > >Really? There is not even a mechanism for manual fussing. Do you mean >sometimes you created circularities and had to rethink your rules? > >One thing that gets crazy is doing a SETF to cell X in an observer on >cell Y. It is not illegal, and it is useful in keeping big hairy >applications efficient, but it does make for some harder thinking >approaching "manual" in that one really needs to think about the >dependency graph at hand. Still a net win, though. > >> (which is also a problem in other attempts at dataflow languages like >> SISAL, Lucid, Oz, etc.). IMO, execution linkages should be determined >> automatically by the compiler. > >No, linkages must be done at run-time because the dependency graph >changes at run-time. And they /are/ determined automatically in Cells. I'm not arguing "compile time" vs. "run time" - I'm saying that the linkages should, IMO, be implicitly specified by usage rather than directed by the programmer. I don't think there should be a need for explicit rules on how to handle particular data - all the data should be "compute on demand" or "compute as ready" determined by whichever condition is reached first. I know that Cells really was not an attempt at a cohesive dataflow language - I make allowances for it being an extension to Lisp rather than a replacement for it. >Maybe you are thinking of some other package? If I have confused Cells with something else, I apologize ... it has been quite a while since I played with it. George
From: Kenneth Tilton on 14 Dec 2009 14:06 George Neuner wrote: > On Fri, 11 Dec 2009 07:24:53 -0500, Kenneth Tilton > <kentilton(a)gmail.com> wrote: > >> George Neuner wrote: >> >>> ... I don't consider [Kenny's] favorite analogy - spreadsheets - to be >>> representative of dataflow. >> Actually Mark Guliano (sp?) the COSI presenter at LUGM 99 preferred the >> same analogy. Where do you see it breaking down? I try to avoid it >> because almost everyone misses that it is an analogy and says something >> like, "Hunh? I do not want to write Lotus 1-2-3!" > > YMMV, but IMO the defining feature of dataflow is that it is a > feed-forward broadcast/multicast model - the goal of the original > hardware model was to implicitly sequence computations so as to > advance them as their input became ready rather than as explicitly > directed by the programmer, to enable efficient use of parallel > hardware and, in particular, to enable speculative execution whenever > possible. I don't believe any software model can claim really to be > dataflow unless it addresses these original goals. That is how Cells works. One /programs/ a rule for X that happens to use Y and Z (same as a spreadsheet) but the runtime action is more like cause and effect: Y or Z change so X changes. At first I did not think of it that way -- cells felt more like those computed slots of C++ -- but after a while I came to see that Cells is more about push (feed-forward) then it is about pull (observing). Unless of course one uses only lazy cells. :) By the way, there is no explicit direction (since v1 or 2). One simply writes a nice functional rule that uses whatever other values one wants to use and dependencies are identified and tracked automatically. It matters that this happens at runtime because of branching code (different dependencies over time for the same rule/instance) and because the values accessed by the same rule may or may not be ones that will change at runtime depending on what specific other things the rule runs against. Knowing this allows a /big/ optimization. > > I concede that it is reasonable to view dataflow as a network of > active observers rather than a network of active providers, but if you > take the position that all the players are active agents, I believe > that the observation model is somewhat less efficient when there are > many linkages and that it makes more sense to have agents be > activated/scheduled upon receipt of data. The applicable hardware > analogy is "asynchronous rendezvous logic" which is what the original > model was based on. > > As for my issue with spreadsheets - their working have traditionally > been explained in terms of the observer model (though obviously they > could be implemented either way - spreadsheet is not a model itself > but an example). In any event, most people understand them in terms > of observation and, because of that, I believe the analogy is, at > best, misleading because it does not (I think adequately) illustrate > the feed forward nature of the computation. Actually, a spreadsheet certainly does feed forward: I change one cell holding the prime rate and the rest of the model is updated. It does not feel like dataflow because the only manifestation is screen updates, but I think I heard about spreadsheets getting triggers which would let them kick off more interesting effects. Imagine lights flashing and bells ringing as you play what-if with a financial model. > > >>> I do consider that Cells itself is in the >>> spirit of dataflow although, when I tried it, my opinion was that the >>> programmer has to do too much manual fussing with the linkage web >> Really? There is not even a mechanism for manual fussing. Do you mean >> sometimes you created circularities and had to rethink your rules? >> >> One thing that gets crazy is doing a SETF to cell X in an observer on >> cell Y. It is not illegal, and it is useful in keeping big hairy >> applications efficient, but it does make for some harder thinking >> approaching "manual" in that one really needs to think about the >> dependency graph at hand. Still a net win, though. >> >>> (which is also a problem in other attempts at dataflow languages like >>> SISAL, Lucid, Oz, etc.). IMO, execution linkages should be determined >>> automatically by the compiler. >> No, linkages must be done at run-time because the dependency graph >> changes at run-time. And they /are/ determined automatically in Cells. > > I'm not arguing "compile time" vs. "run time" - I'm saying that the > linkages should, IMO, be implicitly specified by usage rather than > directed by the programmer. And that is how Cells has worked since version 1 or 2. I just write my code, the Cells engine notices when X uses Y and thereafter updates X as needed when Y changes. > I don't think there should be a need for > explicit rules on how to handle particular data - all the data should > be "compute on demand" or "compute as ready" determined by whichever > condition is reached first. And there should be a lazy option, so "on demand" is the only compute. > > I know that Cells really was not an attempt at a cohesive dataflow > language - I make allowances for it being an extension to Lisp rather > than a replacement for it. Nope, it wan't an attempt at anything. I was just trying to get my application written. :) kt -- http://www.theoryyalgebra.com/index.html
From: George Neuner on 14 Dec 2009 17:54 On Mon, 14 Dec 2009 14:06:07 -0500, Kenneth Tilton <kentilton(a)gmail.com> wrote: >George Neuner wrote: >>> >> YMMV, but IMO the defining feature of dataflow is that it is a >> feed-forward broadcast/multicast model - the goal of the original >> hardware model was to implicitly sequence computations so as to >> advance them as their input became ready rather than as explicitly >> directed by the programmer, to enable efficient use of parallel >> hardware and, in particular, to enable speculative execution whenever >> possible. I don't believe any software model can claim really to be >> dataflow unless it addresses these original goals. > >That is how Cells works. One /programs/ a rule for X that happens to use >Y and Z (same as a spreadsheet) but the runtime action is more like >cause and effect: Y or Z change so X changes. > >At first I did not think of it that way -- cells felt more like those >computed slots of C++ -- but after a while I came to see that Cells is >more about push (feed-forward) then it is about pull (observing). > >Unless of course one uses only lazy cells. :) Ok, I'll have to look at it again. I had the impression that it was an observer model. Perhaps I played with an early version? >By the way, there is no explicit direction (since v1 or 2). One simply >writes a nice functional rule that uses whatever other values one wants >to use and dependencies are identified and tracked automatically. It >matters that this happens at runtime because of branching code >(different dependencies over time for the same rule/instance) and >because the values accessed by the same rule may or may not be ones that >will change at runtime depending on what specific other things the rule >runs against. Knowing this allows a /big/ optimization. The direction does make a difference if you consider active agents. It is much more efficient to push a value to a bunch of suspended agents than for those agents to poll for it. >> I concede that it is reasonable to view dataflow as a network of >> active observers rather than a network of active providers, but if you >> take the position that all the players are active agents, I believe >> that the observation model is somewhat less efficient when there are >> many linkages and that it makes more sense to have agents be >> activated/scheduled upon receipt of data. The applicable hardware >> analogy is "asynchronous rendezvous logic" which is what the original >> model was based on. >> >> As for my issue with spreadsheets - their working have traditionally >> been explained in terms of the observer model (though obviously they >> could be implemented either way - spreadsheet is not a model itself >> but an example). In any event, most people understand them in terms >> of observation and, because of that, I believe the analogy is, at >> best, misleading because it does not (I think adequately) illustrate >> the feed forward nature of the computation. > >Actually, a spreadsheet certainly does feed forward: I change one cell >holding the prime rate and the rest of the model is updated. It does not >feel like dataflow because the only manifestation is screen updates, but >I think I heard about spreadsheets getting triggers which would let them >kick off more interesting effects. Imagine lights flashing and bells >ringing as you play what-if with a financial model. I would say that depends. The formula cell pulls the value. You can make an reasonable analogy to the dataflow hardware model wherein the value cell is a latched register and the formula cell is triggered logic, but to be valid, that presupposes that modifying the value cell directly causes (re)evaluation of the formula cell. In an actual spreadsheet implementation, that may or may not be the case. Actual spreadsheet code I have seen over the years has uniformly used backward chained evaluation: there's a list of formula cells that refer to values. I have seen implementations keep forward links from values to referencing formulas, but AFAIHS, those forward links were only used as starting points for recomputing dependencies for a new round of evaluation. The evaluation itself walked the dependency ordered list of formulas. I've never encountered an implementation I would definitely classify as forward chained. >> I don't think there should be a need for explicit rules on how to >> handle particular data - all the data should be "compute on demand" >> or "compute as ready" determined by whichever condition is reached >> first. > >And there should be a lazy option, so "on demand" is the only compute. There certainly are times when lazy operations and structures are useful, but I'm not a fan of general lazy computation (ala Haskell). Mainly I see laziness as a lame attempt to performance out of serial code. As many programmers can't write correct threaded code to save their own lives, I believe the only hope the average developer has of effectively using lots of cores will come from direct compiler intervention to exploit inherent parallelism and speculatively execute code. The general dearth of good language and runtime support for parallel processing has bothered me for a long time. After too many years I've learned enough about language implementation to finally do something about it ... but my own opus isn't yet ready for the critics. George
From: Kenneth Tilton on 14 Dec 2009 22:22 George Neuner wrote: > On Mon, 14 Dec 2009 14:06:07 -0500, Kenneth Tilton > <kentilton(a)gmail.com> wrote: > >> George Neuner wrote: >>> YMMV, but IMO the defining feature of dataflow is that it is a >>> feed-forward broadcast/multicast model - the goal of the original >>> hardware model was to implicitly sequence computations so as to >>> advance them as their input became ready rather than as explicitly >>> directed by the programmer, to enable efficient use of parallel >>> hardware and, in particular, to enable speculative execution whenever >>> possible. I don't believe any software model can claim really to be >>> dataflow unless it addresses these original goals. >> That is how Cells works. One /programs/ a rule for X that happens to use >> Y and Z (same as a spreadsheet) but the runtime action is more like >> cause and effect: Y or Z change so X changes. >> >> At first I did not think of it that way -- cells felt more like those >> computed slots of C++ -- but after a while I came to see that Cells is >> more about push (feed-forward) then it is about pull (observing). >> >> Unless of course one uses only lazy cells. :) > > Ok, I'll have to look at it again. I had the impression that it was > an observer model. Perhaps I played with an early version? No, it has been like this from the beginning. But you can see an observer model if you want, that is just not how it works when Something Happens. I'll say it again: one /codes/ declarative, functional rules but the engine reacts to change and propagates efficiently to dependents. > > >> By the way, there is no explicit direction (since v1 or 2). One simply >> writes a nice functional rule that uses whatever other values one wants >> to use and dependencies are identified and tracked automatically. It >> matters that this happens at runtime because of branching code >> (different dependencies over time for the same rule/instance) and >> because the values accessed by the same rule may or may not be ones that >> will change at runtime depending on what specific other things the rule >> runs against. Knowing this allows a /big/ optimization. > > The direction does make a difference if you consider active agents. It > is much more efficient to push a value to a bunch of suspended agents > than for those agents to poll for it. No one is polling. > > >>> I concede that it is reasonable to view dataflow as a network of >>> active observers rather than a network of active providers, but if you >>> take the position that all the players are active agents, I believe >>> that the observation model is somewhat less efficient when there are >>> many linkages and that it makes more sense to have agents be >>> activated/scheduled upon receipt of data. The applicable hardware >>> analogy is "asynchronous rendezvous logic" which is what the original >>> model was based on. >>> >>> As for my issue with spreadsheets - their working have traditionally >>> been explained in terms of the observer model (though obviously they >>> could be implemented either way - spreadsheet is not a model itself >>> but an example). In any event, most people understand them in terms >>> of observation and, because of that, I believe the analogy is, at >>> best, misleading because it does not (I think adequately) illustrate >>> the feed forward nature of the computation. >> Actually, a spreadsheet certainly does feed forward: I change one cell >> holding the prime rate and the rest of the model is updated. It does not >> feel like dataflow because the only manifestation is screen updates, but >> I think I heard about spreadsheets getting triggers which would let them >> kick off more interesting effects. Imagine lights flashing and bells >> ringing as you play what-if with a financial model. > > I would say that depends. The formula cell pulls the value. No, that is the way it is coded. The engine then propagates a change to, say, prime rate throughout the model. > You can > make an reasonable analogy to the dataflow hardware model wherein the > value cell is a latched register and the formula cell is triggered > logic, but to be valid, that presupposes that modifying the value cell > directly causes (re)evaluation of the formula cell. In an actual > spreadsheet implementation, that may or may not be the case. How Lotus 1-2-3 is programmed I do not know, but I doubt it is does not build a two-way dependency graph meaning cells know what they use and used cells know who uses them. It's pretty simple code, after all. IIRC in the early days one could not use a Cell higher up in the grid, or maybe it was just that everything always got recalculated regardless. But those early days did not last long. > > Actual spreadsheet code I have seen over the years has uniformly used > backward chained evaluation: there's a list of formula cells that > refer to values. I have seen implementations keep forward links from > values to referencing formulas, but AFAIHS, those forward links were > only used as starting points for recomputing dependencies for a new > round of evaluation. There is no way around a certain amount of "polling" because a rule can be forced to run but then it cannot assume everyone else besides the triggering value is current. Again, branching code make it impossible to know. Cells uses a generational counter to decide currency and the so the cost is negligible: we still recalculate only rules that are both out of date and dependent on something that changed, otherwise simply sweep the dependency graph looking for out-of-date dependents on changed other Cells. The tree gets visited exactly once because a Cell determined not to be need recalculation gets its generation set to the current. Other tools apparently run a rule hoping everyone is current and then rerun if it turns out domeone they used got reran. Could take some fancy analysis to decide which is more efficient and when (I suspect the answer varies depending on the dependency graph). > The evaluation itself walked the dependency > ordered list of formulas. I've never encountered an implementation I > would definitely classify as forward chained. The correct algorithm walks only the dependency graph that /might/ be affected, and it is simply comparing two integers. And again there is no way to avoid this -- well, the alternative is the other model I mentioned: don't check, just run. Re-run if it turns out you used an old value. That could be fast in that in my experience re-runs would be needed rarely, I just have not stopped to figure out what the housekeeping would cost (the h/k to identify rules that ran against obsolete values). > > >>> I don't think there should be a need for explicit rules on how to >>> handle particular data - all the data should be "compute on demand" >>> or "compute as ready" determined by whichever condition is reached >>> first. >> And there should be a lazy option, so "on demand" is the only compute. > > There certainly are times when lazy operations and structures are > useful, but I'm not a fan of general lazy computation (ala Haskell). No, it just needs to be an option for particular oddball cases. > Mainly I see laziness as a lame attempt to performance out of serial > code. As many programmers can't write correct threaded code to save > their own lives, I believe the only hope the average developer has of > effectively using lots of cores will come from direct compiler > intervention to exploit inherent parallelism and speculatively execute > code. > > The general dearth of good language and runtime support for parallel > processing has bothered me for a long time. After too many years I've > learned enough about language implementation to finally do something > about it ... but my own opus isn't yet ready for the critics. I'll keep my eyes open for the ANNC. kt -- http://thelaughingstockatpngs.com/ http://www.facebook.com/pages/The-Laughingstock/115923141782?ref=nf
From: W. James on 17 Dec 2009 20:39
Kenneth Tilton wrote: > I was clever enough to stipulate "an intelligent question", meaning > one demonstrating one has given some thought to what one read and > worked through to a synthetic consequence evidencing their thought. I was clever enough to stipulate "an intelligent question", meaning one demonstrating he has given some thought to what he read and worked through to a synthetic consequence evidencing his thought. -- "A woman's place is in the home." --- Wise Old Saying |