From: Will Duquette on
I'm writing a model that bears some resemblance to a spreadsheet
model: cells with formulas that depend on the values of other cells.
I'm trying to determine a natural order for evaluating the cells so as
to minimize the number of iterations required to recompute the model.

You can think of the model as a directed graph, where the nodes are
the cells and the arcs are the dependencies of a cell's formulas on
other cells: a node points at the nodes it depends on.

There are two problems: when the model contains no cyclic
dependencies, and when it does.

1. Given an acyclic directed graph, produce a list of the nodes such
that each node depends only on nodes prior to it in the list. If
there are no cycles, this order should recompute the model in one
pass.

2. Given an acyclic directed graph *with* cycles, produce a similar
list of the nodes such that to the extent possible the cycles are
pushed to the end and th enumber of iterations is minimized.

I've come up with an algorithm for #1 and a heuristic for #2, but they
are both kind of brute force; and this is clearly a solved problem.
Can anyone point me in the right direction?
From: Alexandre Ferrieux on
On Jan 20, 8:04 pm, Will Duquette <w...(a)wjduquette.com> wrote:
> I'm writing a model that bears some resemblance to a spreadsheet
> model: cells with formulas that depend on the values of other cells.
> I'm trying to determine a natural order for evaluating the cells so as
> to minimize the number of iterations required to recompute the model.
>
> You can think of the model as a directed graph, where the nodes are
> the cells and the arcs are the dependencies of a cell's formulas on
> other cells: a node points at the nodes it depends on.
>
> There are two problems: when the model contains no cyclic
> dependencies, and when it does.
>
> 1. Given an acyclic directed graph, produce a list of the nodes such
> that each node depends only on nodes prior to it in the list.  If
> there are no cycles, this order should recompute the model in one
> pass.
>
> 2. Given an acyclic directed graph *with* cycles, produce a similar
> list of the nodes such that to the extent possible the cycles are
> pushed to the end and th enumber of iterations is minimized.
>
> I've come up with an algorithm for #1 and a heuristic for #2, but they
> are both kind of brute force; and this is clearly a solved problem.
> Can anyone point me in the right direction?

You do know that #1 is called "topological sort", don't you ?
A handy tool doing that is called 'tsort' in unix. ISTR it is the
naive O(N^2) implementation:
- find a minimal element (O(N))
- remove it
- repeat

For #2, you just need a bit if refinement of the "find a minimal
element" step: mark the visited elements. If you run on a marked
element, you've found a cycle and you can put it aside (e.g. by
"fusing" it into a macro-node so that the rest of the algorithm can
proceed).

However, for the problem at hand, I don't quite grok what you mean by
"minimizing the number of iterations". A cycle in formula dependencies
is in the general case a set of equations, and apart from the purely
linear case you're out of luck for a general solver...

-Alex
From: Will Duquette on
On Jan 20, 1:00 pm, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com>
wrote:
> On Jan 20, 8:04 pm, Will Duquette <w...(a)wjduquette.com> wrote:
>
>
>
>
>
> > I'm writing a model that bears some resemblance to a spreadsheet
> > model: cells with formulas that depend on the values of other cells.
> > I'm trying to determine a natural order for evaluating the cells so as
> > to minimize the number of iterations required to recompute the model.
>
> > You can think of the model as a directed graph, where the nodes are
> > the cells and the arcs are the dependencies of a cell's formulas on
> > other cells: a node points at the nodes it depends on.
>
> > There are two problems: when the model contains no cyclic
> > dependencies, and when it does.
>
> > 1. Given an acyclic directed graph, produce a list of the nodes such
> > that each node depends only on nodes prior to it in the list.  If
> > there are no cycles, this order should recompute the model in one
> > pass.
>
> > 2. Given an acyclic directed graph *with* cycles, produce a similar
> > list of the nodes such that to the extent possible the cycles are
> > pushed to the end and th enumber of iterations is minimized.
>
> > I've come up with an algorithm for #1 and a heuristic for #2, but they
> > are both kind of brute force; and this is clearly a solved problem.
> > Can anyone point me in the right direction?
>
> You do know that #1 is called "topological sort", don't you ?
> A handy tool doing that is called 'tsort' in unix. ISTR it is the
> naive O(N^2) implementation:
>    - find a minimal element (O(N))
>    - remove it
>    - repeat
>
> For #2, you just need a bit if refinement of the "find a minimal
> element" step: mark the visited elements. If you run on a marked
> element, you've found a cycle and you can put it aside (e.g. by
> "fusing" it into a macro-node so that the rest of the algorithm can
> proceed).
>
> However, for the problem at hand, I don't quite grok what you mean by
> "minimizing the number of iterations". A cycle in formula dependencies
> is in the general case a set of equations, and apart from the purely
> linear case you're out of luck for a general solver...
>
> -Alex

Having written my original post, I visited Wikipedia, and found out
about topological sorts; I'd run across the concept before, but not
the term.

As for #2: there are a surprising number of cyclic models that will
iterate to convergence, even if they are not quite linear. Iterating
over the cells is then equivalent to the Gauss-Seidel algorithm for
solving systems of equations. I figure that if I can sort the cells
in as near to a topological sort as I can get, it should reduce the
number iterations somewhat.
From: Alexandre Ferrieux on
On Jan 20, 11:09 pm, Will Duquette <w...(a)wjduquette.com> wrote:
>
> As for #2: there are a surprising number of cyclic models that will
> iterate to convergence, even if they are not quite linear.  Iterating
> over the cells is then equivalent to the Gauss-Seidel algorithm for
> solving systems of equations.  I figure that if I can sort the cells
> in as near to a topological sort as I can get, it should reduce the
> number iterations somewhat.

If you're confident that you don't run across a majority of divergent
systems, I guess it makes sense :}

Then glueing together cycle members sounds like a reasonable way of
isolating them without disrupting the global order.

-Alex
From: Andrew Mangogna on
Will Duquette wrote:

> I'm writing a model that bears some resemblance to a spreadsheet
> model: cells with formulas that depend on the values of other cells.
> I'm trying to determine a natural order for evaluating the cells so as
> to minimize the number of iterations required to recompute the model.
>
> You can think of the model as a directed graph, where the nodes are
> the cells and the arcs are the dependencies of a cell's formulas on
> other cells: a node points at the nodes it depends on.
>
> There are two problems: when the model contains no cyclic
> dependencies, and when it does.
>
> 1. Given an acyclic directed graph, produce a list of the nodes such
> that each node depends only on nodes prior to it in the list. If
> there are no cycles, this order should recompute the model in one
> pass.
>
> 2. Given an acyclic directed graph *with* cycles, produce a similar
> list of the nodes such that to the extent possible the cycles are
> pushed to the end and th enumber of iterations is minimized.

My first impulse here would be to perform a depth first search of the graph to
compute a spanning tree and then sort the tree. Some advanced compiler books
might be a good resource since flow graphs have many of these properties and
compiler optimizations do a lot of graph manipulations.

>
> I've come up with an algorithm for #1 and a heuristic for #2, but they
> are both kind of brute force; and this is clearly a solved problem.
> Can anyone point me in the right direction?

--
Andrew Mangogna