Polymorphism sucks [Was: Paradigms which way to go?] [OOP]

From: topmind on 10 Jul 2005 00:58

> Trees aren't always the right design solution.
> OO is not primarily about making trees.
> You *can* make trees with OO, but you don't have to; and you often
> shouldn't.
>

I kept saying over and over that even a fair amount of OO fans agree
with me with regard to trees and subtypes and they thus use other
non-hierarchical OO techniques instead.

People complain about my repetition, yet the message still does not get
across.

-T-

From: Chris Sonnack on 29 Jul 2005 21:10

topmind writes:

>>> It not necessarily about OOD. [...] it was talking about trees in
>>> general, not necessarily related to polymorphism or OO. The context
>>> was about the appearent popularity of trees among computer users
>>> IIRC, not just developers.
>>
>> In fact--as your own task break down showed--it's a natural form for
>> any intelligent person when breaking down something comprised of many
>> parts. Or more properly, of parts and sub-parts...as all tasks are.
>
> Like I keep saying, small trees are fine in some cases. When routines
> get bigger, one has to form sub-routines to avoid duplicating
> code (ie, duplicate nodes) in most cases. I rarely encounter routines
> that grow more than about 200 lines before candidates for
> duplication-factoring start to appear.

You're confusing several issues here. By your own quote at the top,
we're talking about the popularity of trees (which I'd say goes beyond
computer *users* to all intelligent minds).

Trees--as an analytic tool--are *extremely* valuable, and for good
reason. You demonstrated this yourself when you described a task as
an outline. And, contrary to your statements about size, they become
even more valuable when the task is large.

I'm moving a later section up here, because it applies:

>>>> The simple fact is, you broke a common task into levels. Just about
>>>> any set of tasks naturally breaks down that way. (In fact, I've
>>>> spent this week working with a project leader for a coming project
>>>> doing just that. The project is very large, and without a
>>>> hierarchical breakdown, it'd be beyond the capacity of any human
>>>> to deal with.)
>>>
>>> I bet you start to have cycles (or node duplication) past a certain
>>> point.
>>
>> Nope. There are *no* duplicate tasks. Each is distinct.
>
> I don't believe you. I would have to see the code. I think
> there is probably a misunderstanding somewhere here.

Yes. Yours. Nowhere above am I talking about code. I'm talking about
breaking down a large project into separate tasks. Exactly what you
did a few weeks ago with a very small task. In this case, we're talking
about a three-month project involving about a dozen people (more if you
count involved customers and testers).

I would submit that an outline (aka tree) is the ONLY viable way to
do such a breakdown. It's not a "convenient lie", it's a reality.
Large tasks are made of small tasks. It's natural.

WARNING: NOW WE'RE TALKING ABOUT DATA....

>> If all we're talking about is the taxonomy, then one can use as
>> many trees as needed to express different views of the information
>> depending on what one is looking for.
>
> Yes, but IMO sets are superior to multiple trees in most cases.

For my money, if I had a situation with a pile of data and I wanted
different (hierarchical) views, I'd very likely put the data in a
database, table or whatever (aka set) and *VIEW* it with tree tool.

If the situation were such that the data was naturally hierarchical
and had a predominent tree structure to it, I might very well store
it in a tree--at least while in memory. As I've said before, if I
need to persist data (that is, store it on disk), then typically I
use some sort of table, flat file or database. It all depends.

WARNING: NOW WE'RE TALKING ABOUT CODING...

>>> Just like a record from a "Drink" entity.
>>
>> Nope. A record is dumb. It requires code to process it.
>
> [...snip non-responsive answer...]
>
>>>> Imagine I have a collection of Drink objects and I want to list
>>>> them in order by the amount of caffeine. simple, since they all
>>>> have a common interface that lets me ask for the caffeine content.
>>>
>>> Just like a record from a "Drink" entity.
>>
>> Nope. Records are dumb. They have no interface.
>
> They have no "interface"? Data queries.

Records can't be queried. DATABASES can. Records are just dumb
"things"....a collection of fields.

>> Database design is a pretty well-explored domain. There are some very
>> fundamental limitations here. You either need to look at each record
>> to determine if it matches your query, OR you need to--da da--look at
>> a subset....a child of the full dataset.
>
> Perhaps, but there are potentially other indexing schemes that
> are non-tree. But it is a moot point anyhow. Just because the
> underlying machine uses 1's and 0's does not nec. mean that
> programmers should also.

All programmers ultimately do use 1s and 0s. We just have tools that
usually shield us from the messy details. More to the point, what the
DB indexes demonstrate is that, yet again, hierarchical organization
can be a huge win.

>> When it comes to dealing with large datasets, partitioning is a win.
>> Database designers know this, hence the indexing technology.
>
> Philosophers have long known there is rarely One Right Taxonomy/
> Partitioning for a given thing.

So you use multiple partition schemes where needed. No biggie.
The fact remains, without partitioning *somehow*, you have a mess.

>>> One could use SQL, another relational query language, and/or
>>> Query-By-Example to find stuff.
>>
>> Not seeing anything "relational" about this. You do understand the
>> term, right? (SQL, for example, works just fine in non-relational
>> databases.)
>
> Hmmmm. I don't think I agree with this, but will have to think about
> that one.

I thought you were a "database guy"? Think about a database with only
one table. Think you can write an SQL query to return a subset of
the records in that table? Of COURSE you can.

WARNING: TALKING ABOUT VIRTUAL FILE SYSTEMS NOW...

>> Numerical identifiers for folders? Very dumb idea!
>> The example shows:
>>
>> } westsrvr:4251/slides.shw
>>
>> "4251" has no connection to anything real. How can anyone think
>> users can remember "4251"? I have thousands of files in hundreds
>> of directories. No human could remember distinct numbers for all
>> those folders. I can't believe anyone sane could think this was
>> a viable option.
>
> There may be other ways to label folders, and each may also have
> a discription attribute, and perhaps even make it a primary key.

Of course, and your page mentions that. I was commenting there on
the stupidity of subjecting human users to numerical identifiers.

Consider a large--very large--file system. How many attributes does
it take to uniquely identify a file, to enable you to locate it?

Consider that these attributes are essentially AND'd together. If
you were writing a query, your WHERE clause would have many sub-clauses
connected with ANDs.

Which, logically speaking, is EXACTLY what a file path is.

The difference is that a file path is a lot easier to browse.
I looked at your--whatchamacallit--finder window. Considering that
in a day's work I reference hundreds of files, having to use some
GUI search tool each time I wanted to locate a file.....NFW.

> For example, when companies move articles around in
> web URL paths, often it busts existing browser bookmarks.
> The same thing can happen with "meaningful" names.
> A "dumb" key is safer from such because it carries no
> external meaning.

What makes you think a dumb key prevents files from being moved?
The exact same problem exists. The only problem is that with a
"dumb" key, there's no logical handle. At least "Bobs Sales Project"
is a sensible thing to search for.

>> Further, all it's done is create an "abbreviation" for a location
>> (but a very difficult abbreviation to remember). The same issue
>> of changes applies. Change the location, and everyone's links
>> are wrong.
>
> Whaaaaaat?

You seem to be assuming that files in your system never move. I
have no reason--quite the contrary, in fact--to think that's true.
Files move for lots of reasons.

What you don't seem to understand is that a regular hierarchical file
system is logically equivalent to your system. You have attributes,
a regular FS has path parts. If you change the attributes/path parts,
the file "moves" and people's static references to it are no longer
valid.

The only real difference is that your virtual system requires a lot
more overhead. And, ironically, a real implementation of your imaginary
system would probably use a hierarchical FS on the back end anyway.
If you throw all the files in one BIG directory, performance tends to
drop a lot. (It's exactly the same as how index are hierarchical.)

>> Later on the page is the idea of associating properties to a file and
>> later searching for it by properties. Which is fine until you forget
>> what properties you used, make up too many to handle, or change them.
>> How do you browse through the data to find the lost file??
>
> How is this worse than forgetting a giant path???

It's approximately the same, I'd say. The difference is that your system
uses a lot more overhead and--depending on how it's implemented--may be
a lot harder to browse quickly.

>> It also requires a big database in which to store all this.
>
> So? Higher abstractions require more horse-power. I don't
> think 30,000 desktop files requires that much horse-power.

I thought we were talking WANs and big companies. I likely have more
than 30K files--way more--on just my own machine. My company must have
millions.

WARNING: TALKING ABOUT WORDS NOW...

>>>> It's simple and undeniable: sets have less structure than trees. EOS.
>>>
>>> Prove it!
>>
>> I don't have to, you just admitted it.
>
> It depends entirely on how one defines "structure".

Let's move this later bit to here:

>>>> Higher: above, superior.
>>>> Order: degree.
>>>> Structure: a complex construction or entity.
>>>
>>> Define "superior". Define "complex".
>>
>> My guess is you know what "superior" and "complex" mean.
>
> Those terms tend to be relative and vague.

They may be somewhat relative--many things are--but I can't say
I find them in any way vague.

> It is too imprecise for our purposes.

So propose an alternative.

> For example, Bill Gates may be "superior" from a money tally
> standpoint, but if he ends up in hell when he dies (as a
> hypothetical example), then he is not superior from a
> religious standpoint.

Why do you think an attribute applies to all contexts? From a
financial point of view, Gates is *way* superior. Period. There
is nothing vague or relative about that.

So, getting back to the point, structure is well-defined and we HAVE
a context. Let's talk set theory. You're big on sets, hopefully
you have some grasp of the underlying theory of sets. Let's find
out.

Here's a set: {{red} {blue} {green} {yellow} {thursday} {28} {}}

What does it mean? What is it's structure?

Here's another: {{1} {3} {2} {1.5} {34} {99} {0}}

What does it mean? Is it the same as the other set? If so, why?
If not, why not?

Here's another: {{} {} {} {} {} {} {}}
And another: {{{}}}

Are these the same or different from the first two? Why or why not?

What--if any--structure exists in any of the above sets?
What--if anything--can you say about the above sets?

Now consider this data structure:
{red}
{green}
{yellow}
{28}
{blue}
{thursday}
{}

What--if any--structure exists in the above data structure?
What--if anything--can you say about the above data structure?
(Does the fact that it's called a data STRUCTURE suggest anything?)

We await your answers--no copying your neigbor's, now.

WARNING: BACK TO HIERARCHIES AND TREES IN GENERAL...

>>> Having 2 bosses in a really small company. Who is the "root"?
>>
>> The CEO.
>
> I worked at a company that had 3 owners and I had 3 bosses.

Can the bosses give you orders? Can they give the owners orders?

What happened when each of the three bosses asked you to be at a
different meeting scheduled for the same time?

>> To the point: the situation is *entirely* hierarchical.
>
> No, that is not a pure tree.

You keep saying that like it means something. It doesn't.

>>> A GUI page A where you launch page B, but click a link which opens
>>> another instance of page A. This is common during web-browsing, I
>>> would note.
>>
>> ?? What does browser history have to do with anything? That doesn't
>> in any way speak to recursion in the software.
>
> If A calls B, B calls C, and C calls A, it *is* recursion.

The browsing *history* follows a recursive path. The browser software
does not. Your claim that GUI software is recursive is incorrect.

GUI **usage** may recurse, but so what?

>>>>> But a pure tree has no duplicate nodes.
>>>>
>>>> That's BS. Show me one authority that agrees.
>>>
>>> Connect the subroutine calls on the paper. Don't take my word for it,
>>> get your pen out.
>>
>> So,... when you can't respond to a point, you just throw in something
>> totally irrelevant? Okay. We'll just assume capitulation.
>
> No, I just cannot easily describe it without a visual.

Please read carefully. Please read the first quoted line above. Then
read the response--second quoted line above. Your response--third line
above--is non-responsive, and the point is lost. Let's make it more
clear:

>you> But a pure tree has no duplicate nodes.
>>
>me> That's BS. Show me one authority that agrees.

Your turn again.

>>>> Totally False. The call tree represents reality.
>>>
>>> And lots of duplications of parts of reality.
>>
>> If a routine is entered multiple times, that **IS** the reality.
>
> And it is duplication.

What part of "that IS the reality" don't you get? If a routine is
entered multiple times, the call graph **MUST** have duplicates.
(You DO know what a call graph is, don't you?--a(n imaginary) tree
listing the thread of execution of a running program.)

Once more: the call graph is a "pure" tree--no node links back, no
branches connect--but does indeed have "duplicate" nodes. That is,
the name of the node is duplicated. In the reality of the running
program, each visit to a given routine is *contextually* different.

Consider a parse tree. Most programs use the language statements many
times, so there are duplicate, say, "IF/ELSE" nodes. Probably lots
and lots of them. But the parse tree itself is still a "pure" tree,
because each "IF/ELSE" is contextually different.

Get it? Keep at it until you do.

>>>> I DO event-driven programming, and my programs definately have
>>>> high level routines and low level routines (and many medium level
>>>> routines).
>>>
>>> Show me the hierarchy here in out-line form then.
>>
>> Let's consider the most recent project I've worked on:
>>
>> Main
>> Initialize_Common_Globals()
>> Initialize_Program()
>> Load_Program()
>> LoadProperties()
>> Get_Special_Target_List()
>> LoadList(SpecialTargetList)
>> OpenTable("[SpecialTargets]")
>> <load loop>
>> Close
>> Load(ApplicationForm)
>>
>> Et cetera. Get the picture?
>
> That looks like an implementation of the event-handling engine,
> not actual events themselves.

What part of "my programs definately have high level routines and
low level routines" did you fail to apprehend?

>>> Lack of experience? I am a middle-aged developer. Started out on VAX's
>>> and PRIME minicomputers.
>>
>> Time in the saddle doesn't necessarily translate to knowledge. You just
>> don't talk like someone who really understands data structures or OOD or
>> how polymorphism is used.
>
> I similarly feel you don't have experience with relational
> and databases,...

Despite that I've worked heavily with them for 20 years? (-:

> ...such as your complaint about auto-generated folder keys above.

My complaint stands. It's a dumb idea.

> (Some relational fans don't agree with auto-gen keys, but know that
> named primary keys are also a possibility without giving it second
> thought. I had to remind you.)

I'm quite familiar with all this. What you don't seem to appreciate is
that numerical identifiers are, in this case--to be blunt--stupid. No
human will like them or use them. If you replace them with string keys
(which, BTW, is my preference in DB *design* where possible--I only use
auto-gen'd keys were absolutely necessary), then you're no different
than using a file system path.

And incidentally, I will give you this: a virtual file system ON TOP OF
a regular one--a system that attaches searchable attributes and other
meta-data to files--IS a cool thing.

--
|_ CJSonnack <Chris(a)Sonnack.com> _____________| How's my programming? |
|_ http://www.Sonnack.com/ ___________________| Call: 1-800-DEV-NULL |
|_____________________________________________|_______________________|

From: topmind on 30 Jul 2005 02:16

Chris Sonnack wrote:
> topmind writes:
>
> >>> It not necessarily about OOD. [...] it was talking about trees in
> >>> general, not necessarily related to polymorphism or OO. The context
> >>> was about the appearent popularity of trees among computer users
> >>> IIRC, not just developers.
> >>
> >> In fact--as your own task break down showed--it's a natural form for
> >> any intelligent person when breaking down something comprised of many
> >> parts. Or more properly, of parts and sub-parts...as all tasks are.
> >
> > Like I keep saying, small trees are fine in some cases. When routines
> > get bigger, one has to form sub-routines to avoid duplicating
> > code (ie, duplicate nodes) in most cases. I rarely encounter routines
> > that grow more than about 200 lines before candidates for
> > duplication-factoring start to appear.
>
> You're confusing several issues here. By your own quote at the top,
> we're talking about the popularity of trees (which I'd say goes beyond
> computer *users* to all intelligent minds).
>
> Trees--as an analytic tool--are *extremely* valuable, and for good
> reason. You demonstrated this yourself when you described a task as
> an outline. And, contrary to your statements about size, they become
> even more valuable when the task is large.

But they are no longer (pure) trees when the task gets large. Tree-ish,
or semi-tree if you will, but not pure trees.

Further, there may be other non-tree ways to specify algorithms, it is
just that there is not a lot of research on that. Certain Expert
Systems are an example, but people generally found them difficult to
program. (Perhaps better query tools would have helped.) The
hierarchical approach to algorithm design is so far the quickest to
learn it appears.

>
> I'm moving a later section up here, because it applies:
>
> >>>> The simple fact is, you broke a common task into levels. Just about
> >>>> any set of tasks naturally breaks down that way. (In fact, I've
> >>>> spent this week working with a project leader for a coming project
> >>>> doing just that. The project is very large, and without a
> >>>> hierarchical breakdown, it'd be beyond the capacity of any human
> >>>> to deal with.)
> >>>
> >>> I bet you start to have cycles (or node duplication) past a certain
> >>> point.
> >>
> >> Nope. There are *no* duplicate tasks. Each is distinct.
> >
> > I don't believe you. I would have to see the code. I think
> > there is probably a misunderstanding somewhere here.
>
> Yes. Yours. Nowhere above am I talking about code. I'm talking about
> breaking down a large project into separate tasks. Exactly what you
> did a few weeks ago with a very small task. In this case, we're talking
> about a three-month project involving about a dozen people (more if you
> count involved customers and testers).
>
> I would submit that an outline (aka tree) is the ONLY viable way to
> do such a breakdown. It's not a "convenient lie", it's a reality.
> Large tasks are made of small tasks. It's natural.

Are you talking about software design, or Gantt-charts and the like?
Even if it works well from an organizational standpoint, that does not
necessarily mean it is applicable directly in software.

Besides, for non-trivial projects there is often repetition of some
tasks such that it is not a pure tree. For example, multiple tasks on
different branches might have a "make photocopies" sub-task(s). We are
right back to the duplication-of-nodes-or-cross-branch-links problem
yet again. You cannot seem to escape DNCBL.

>
>
> WARNING: NOW WE'RE TALKING ABOUT DATA....

Semi-irrelavent. They have been shown to be interchangle concepts (even
though it may not always be practical).

>
> >> If all we're talking about is the taxonomy, then one can use as
> >> many trees as needed to express different views of the information
> >> depending on what one is looking for.
> >
> > Yes, but IMO sets are superior to multiple trees in most cases.
>
> For my money, if I had a situation with a pile of data and I wanted
> different (hierarchical) views, I'd very likely put the data in a
> database, table or whatever (aka set) and *VIEW* it with tree tool.
>
> If the situation were such that the data was naturally hierarchical
> and had a predominent tree structure to it, I might very well store
> it in a tree--at least while in memory. As I've said before, if I
> need to persist data (that is, store it on disk), then typically I
> use some sort of table, flat file or database. It all depends.

Tables and "persistence" are orthogonal concepts. We use relational
databases because they help us organize and sift data, not necessarily
because they "store better".

>
>
> WARNING: NOW WE'RE TALKING ABOUT CODING...
>
> >>> Just like a record from a "Drink" entity.
> >>
> >> Nope. A record is dumb. It requires code to process it.
> >
> > [...snip non-responsive answer...]
> >
> >>>> Imagine I have a collection of Drink objects and I want to list
> >>>> them in order by the amount of caffeine. simple, since they all
> >>>> have a common interface that lets me ask for the caffeine content.
> >>>
> >>> Just like a record from a "Drink" entity.
> >>
> >> Nope. Records are dumb. They have no interface.
> >
> > They have no "interface"? Data queries.
>
> Records can't be queried. DATABASES can. Records are just dumb
> "things"....a collection of fields.

So are classes until one "runs" the software. I don't see what you are
getting at.

>
> >> Database design is a pretty well-explored domain. There are some very
> >> fundamental limitations here. You either need to look at each record
> >> to determine if it matches your query, OR you need to--da da--look at
> >> a subset....a child of the full dataset.
> >
> > Perhaps, but there are potentially other indexing schemes that
> > are non-tree. But it is a moot point anyhow. Just because the
> > underlying machine uses 1's and 0's does not nec. mean that
> > programmers should also.
>
> All programmers ultimately do use 1s and 0s. We just have tools that
> usually shield us from the messy details. More to the point, what the
> DB indexes demonstrate is that, yet again, hierarchical organization
> can be a huge win.

It does not demonstrate anything more than 1's and 0's do. You have not
effectively argued why trees are immunite from the binary digit analogy
here, which shows that scaling from under-the-hood outward is not
necessarily applicable.

>
> >> When it comes to dealing with large datasets, partitioning is a win.
> >> Database designers know this, hence the indexing technology.
> >
> > Philosophers have long known there is rarely One Right Taxonomy/
> > Partitioning for a given thing.
>
> So you use multiple partition schemes where needed. No biggie.
> The fact remains, without partitioning *somehow*, you have a mess.

RDBMS partition information into tables, records, and columns.

>
>
> >>> One could use SQL, another relational query language, and/or
> >>> Query-By-Example to find stuff.
> >>
> >> Not seeing anything "relational" about this. You do understand the
> >> term, right? (SQL, for example, works just fine in non-relational
> >> databases.)
> >
> > Hmmmm. I don't think I agree with this, but will have to think about
> > that one.
>
> I thought you were a "database guy"? Think about a database with only
> one table. Think you can write an SQL query to return a subset of
> the records in that table? Of COURSE you can.

If it has one table, it is still relational.

>
>
> WARNING: TALKING ABOUT VIRTUAL FILE SYSTEMS NOW...
>
> >> Numerical identifiers for folders? Very dumb idea!
> >> The example shows:
> >>
> >> } westsrvr:4251/slides.shw
> >>
> >> "4251" has no connection to anything real. How can anyone think
> >> users can remember "4251"? I have thousands of files in hundreds
> >> of directories. No human could remember distinct numbers for all
> >> those folders. I can't believe anyone sane could think this was
> >> a viable option.
> >
> > There may be other ways to label folders, and each may also have
> > a discription attribute, and perhaps even make it a primary key.
>
> Of course, and your page mentions that. I was commenting there on
> the stupidity of subjecting human users to numerical identifiers.

Well, I shall refer you to a long debate on this topic:

http://www.c2.com/cgi/wiki?AutoKeysVersusDomainKeys
http://www.c2.com/cgi/wiki?AutoKeysVersusDomainKeysDiscussion

It is mostly moot to this debate anyhow, as described later.

By the way, what would you recommend the US government use
instead of Social Security Numbers to track people?

>
> Consider a large--very large--file system. How many attributes does
> it take to uniquely identify a file, to enable you to locate it?

Uniquely identifying AND locating it are orthogonal issues.

>
> Consider that these attributes are essentially AND'd together. If
> you were writing a query, your WHERE clause would have many sub-clauses
> connected with ANDs.
>
> Which, logically speaking, is EXACTLY what a file path is.

In *some* ways you are correct. However, a tree path is
fragile. You cannot easily rethread it if you don't
like the "attribute" segments of the paths. With
RDB's one can add or subtract attributes without
giant ramifications.

Hierarchical databases have been well-tried in history.
They died off for good reason (file systems being the last
remnants, probably because they need less training
than relational systems and target end-users instead
of just techies).

>
> The difference is that a file path is a lot easier to browse.

ONLY if your browse pattern happens to be tree-shaped. You should note
that Google beat Yahoo partly by abandoning the tree model for info
hunting. Trees lost that round. Yahoo kept its trees, but almost nobody
seems to really use them.

> I looked at your--whatchamacallit--finder window. Considering that
> in a day's work I reference hundreds of files, having to use some
> GUI search tool each time I wanted to locate a file.....NFW.

You can create your own short-cut references, I would note. I do this
in Windows Explorer (with trees) because I hate path clicking.

Further, I won't dictate to you how YOU best work, but I am tired of
hierarchical file systems. If trees float your boat, so be it. I find
them artificial and inflexible. I don't like being married to the
original taxonomy forever and ever because it is too hard/fragile to
change.

>
> > For example, when companies move articles around in
> > web URL paths, often it busts existing browser bookmarks.
> > The same thing can happen with "meaningful" names.
> > A "dumb" key is safer from such because it carries no
> > external meaning.
>
> What makes you think a dumb key prevents files from being moved?

Because there is *no reason* to change them if they are
indeed "dumb". (BTW, what do you mean by being "moved"?
That is a tree concept, not a relational one. See below.)

> The exact same problem exists. The only problem is that with a
> "dumb" key, there's no logical handle. At least "Bobs Sales Project"
> is a sensible thing to search for.

Arrrrg. I did NOT prevent those kinds of searches/attributes by
recommending unique "dumb" keys. You have presented a
false dichotomy here. (Generally "Bob" and "Sales Project"
should be treated as orthogonal attributes. That way
we can get all of Bob's items or all Sales Project
items regardless of who they are assigned to at the
moment if we want.)

>
> >> Further, all it's done is create an "abbreviation" for a location
> >> (but a very difficult abbreviation to remember). The same issue
> >> of changes applies. Change the location, and everyone's links
> >> are wrong.
> >
> > Whaaaaaat?
>
> You seem to be assuming that files in your system never move. I
> have no reason--quite the contrary, in fact--to think that's true.
> Files move for lots of reasons.

In relational theory you generally don't "move" the node, but simply
move/change references or attribs. You just don't seem to "get" the
concept. You are appearently thinking in terms of
physical 3D boxes. Trenscend that. Cyberspace lets us.
"Locations" are increasingly obsolete and meaningless
in cyberspace. In cyberspace we can "live" in a
26-dimensional (factor) universe if we want. Trees
tend to be a 2D concept.

>
> What you don't seem to understand is that a regular hierarchical file
> system is logically equivalent to your system. You have attributes,
> a regular FS has path parts. If you change the attributes/path parts,
> the file "moves" and people's static references to it are no longer
> valid.

An attribute shouldn't disappear unless it is no longer valid.
If your reference to it is based on that disappearing
factor, then it is *proper* for the DB to no longer
return it. It is doing its job right by giving you
exactly what you ask for. If you want
a "steady" reference, use the "dumb key". That is
what it is for. You can have your cake and eat it too;
you just have to know what you want. (Yes, I admit
it requires more training/experience than trees to
use well.)

>
> The only real difference is that your virtual system requires a lot
> more overhead. And, ironically, a real implementation of your imaginary
> system would probably use a hierarchical FS on the back end anyway.
> If you throw all the files in one BIG directory, performance tends to
> drop a lot. (It's exactly the same as how index are hierarchical.)
>

I already addressed performance in the "higher abstraction requires..."
comment.

>
> >> Later on the page is the idea of associating properties to a file and
> >> later searching for it by properties. Which is fine until you forget
> >> what properties you used, make up too many to handle, or change them.
> >> How do you browse through the data to find the lost file??
> >
> > How is this worse than forgetting a giant path???
>
> It's approximately the same, I'd say. The difference is that your system
> uses a lot more overhead and--depending on how it's implemented--may be
> a lot harder to browse quickly.

Only if we are forever stuck with 90's hardware.

>
> >> It also requires a big database in which to store all this.
> >
> > So? Higher abstractions require more horse-power. I don't
> > think 30,000 desktop files requires that much horse-power.
>
> I thought we were talking WANs and big companies. I likely have more
> than 30K files--way more--on just my own machine. My company must have
> millions.

Oracle and DB2 databases can store hundreds of billions of records
IIRC.
(However, they don't provide dynamic attributes......yet)

>
>
> WARNING: TALKING ABOUT WORDS NOW...
>
> >>>> It's simple and undeniable: sets have less structure than trees. EOS.
> >>>
> >>> Prove it!
> >>
> >> I don't have to, you just admitted it.
> >
> > It depends entirely on how one defines "structure".
>
> Let's move this later bit to here:
>
> >>>> Higher: above, superior.
> >>>> Order: degree.
> >>>> Structure: a complex construction or entity.
> >>>
> >>> Define "superior". Define "complex".
> >>
> >> My guess is you know what "superior" and "complex" mean.
> >
> > Those terms tend to be relative and vague.
>
> They may be somewhat relative--many things are--but I can't say
> I find them in any way vague.

I don't know of any concensusly-agreed upon algorithm
or formula that can be used to measure "complexity".

>
> > It is too imprecise for our purposes.
>
> So propose an alternative.

I am sorry, but I have none. It is an issue that is
at or beyond the cutting edge of philosophy and math.
"Complexity" and "structure" are not very usable
for these kinds of discusses.

>
> > For example, Bill Gates may be "superior" from a money tally
> > standpoint, but if he ends up in hell when he dies (as a
> > hypothetical example), then he is not superior from a
> > religious standpoint.
>
> Why do you think an attribute applies to all contexts?

Ahah! So you admit there is no One Right Level/View/Taxonomy.
So, why should file systems be any different?

> From a
> financial point of view, Gates is *way* superior. Period. There
> is nothing vague or relative about that.
>
> So, getting back to the point, structure is well-defined and we HAVE
> a context.

Not.

> Let's talk set theory. You're big on sets, hopefully
> you have some grasp of the underlying theory of sets. Let's find
> out.
>
> Here's a set: {{red} {blue} {green} {yellow} {thursday} {28} {}}
>
> What does it mean? What is it's structure?
>
> Here's another: {{1} {3} {2} {1.5} {34} {99} {0}}
>
> What does it mean? Is it the same as the other set? If so, why?
> If not, why not?
>
> Here's another: {{} {} {} {} {} {} {}}
> And another: {{{}}}
>
> Are these the same or different from the first two? Why or why not?

I didn't propose a definition of "structure", so I don't
see the point in this exercise. Let's explore some
more realistic scenarios instead of Foo Bar examples,
how about?

>
> What--if any--structure exists in any of the above sets?
> What--if anything--can you say about the above sets?
>
> Now consider this data structure:
> {red}
> {green}
> {yellow}
> {28}
> {blue}
> {thursday}
> {}
>
> What--if any--structure exists in the above data structure?
> What--if anything--can you say about the above data structure?
> (Does the fact that it's called a data STRUCTURE suggest anything?)
>
> We await your answers--no copying your neigbor's, now.

An interesting structure (cough) that can be used to
represent sets is a grid with instances on one
axis and sets on another. The intersection (cell)
represents either the existence (Boolean) or the
weighting (if weighted sets are used) per instance.
It allows one to
visually inspect applied sets. If one can dynamically
sort the grid, pattern-hunting is even easier.

This is one (of multiple) techniques that can
provide more visual-ness to sets if that is
what you are truly seeking here.

>
>
> WARNING: BACK TO HIERARCHIES AND TREES IN GENERAL...
>
> >>> Having 2 bosses in a really small company. Who is the "root"?
> >>
> >> The CEO.
> >
> > I worked at a company that had 3 owners and I had 3 bosses.
>
> Can the bosses give you orders? Can they give the owners orders?

The owners *were* my bosses in that case.

>
> What happened when each of the three bosses asked you to be at a
> different meeting scheduled for the same time?

I would tell them there was a conflict and suggest they
consult one another to find an agreeable solution.

>
>
> >> To the point: the situation is *entirely* hierarchical.
> >
> > No, that is not a pure tree.
>
> You keep saying that like it means something. It doesn't.

I don't know of any formal metric for tree-ness of
imperfect trees at this time, but
if you picture taking a tree of say 4 levels
deep, and drawing a line from a node in one branch to
a node in another, you have a broken tree. There are
algorithms to determine tree-ness (Boolean) of
graphs; I suggest you study those
if you want a definition/example
of trees going bad. If you take any tree and keep
drawing lines from any given random node to any other
given node, in all probability you will soon
make the tree fail the treeness test.

>
>
> >>> A GUI page A where you launch page B, but click a link which opens
> >>> another instance of page A. This is common during web-browsing, I
> >>> would note.
> >>
> >> ?? What does browser history have to do with anything? That doesn't
> >> in any way speak to recursion in the software.
> >
> > If A calls B, B calls C, and C calls A, it *is* recursion.
>
> The browsing *history* follows a recursive path. The browser software
> does not. Your claim that GUI software is recursive is incorrect.
>
> GUI **usage** may recurse, but so what?

Recursion busts trees.

>
>
> >>>>> But a pure tree has no duplicate nodes.
> >>>>
> >>>> That's BS. Show me one authority that agrees.
> >>>
> >>> Connect the subroutine calls on the paper. Don't take my word for it,
> >>> get your pen out.
> >>
> >> So,... when you can't respond to a point, you just throw in something
> >> totally irrelevant? Okay. We'll just assume capitulation.
> >
> > No, I just cannot easily describe it without a visual.
>
> Please read carefully. Please read the first quoted line above. Then
> read the response--second quoted line above. Your response--third line
> above--is non-responsive, and the point is lost. Let's make it more
> clear:
>
> >you> But a pure tree has no duplicate nodes.
> >>
> >me> That's BS. Show me one authority that agrees.
>
> Your turn again.

Sigh, we are back to this again. ANY graph
*can* be represented as a tree. However, usually
one has to duplicate nodes to pull it off.
One can represent such "tree breaks" as EITHER
a cross-branch link (line), OR duplicate nodes
on a tree.

You know, it is tough to describe this very well with
words. I can't do it much better than I have tried here.
I may have to get back to you on this
discussion after I build an example image and
have a URL to give you. Words are just plain
failing me.

>
>
> >>>> Totally False. The call tree represents reality.
> >>>
> >>> And lots of duplications of parts of reality.
> >>
> >> If a routine is entered multiple times, that **IS** the reality.
> >
> > And it is duplication.
>
> What part of "that IS the reality" don't you get? If a routine is
> entered multiple times, the call graph **MUST** have duplicates.
> (You DO know what a call graph is, don't you?--a(n imaginary) tree
> listing the thread of execution of a running program.)
>
> Once more: the call graph is a "pure" tree--no node links back, no
> branches connect--but does indeed have "duplicate" nodes. That is,
> the name of the node is duplicated. In the reality of the running
> program, each visit to a given routine is *contextually* different.

No, it has some parts that are the same (same routine called)
and some parts that are different (time stamp and params).
If you break the node into full granularity you
have the dup/cross-branch issue again.

>
> Consider a parse tree. Most programs use the language statements many
> times, so there are duplicate, say, "IF/ELSE" nodes. Probably lots
> and lots of them. But the parse tree itself is still a "pure" tree,
> because each "IF/ELSE" is contextually different.
>
> Get it? Keep at it until you do.
>

I think the problem is that the granularity of your
nodes is too large.

Generally I ignore the IF/ELSE issue here, but would
like to point out that in some languages control
constructs such as IF and WHILE are simply function
calls with some fancy scope management features
used rather than being built into the language.
Thus, we don't really need to make a distinction
between IF and function call duplication.

>
> >>>> I DO event-driven programming, and my programs definately have
> >>>> high level routines and low level routines (and many medium level
> >>>> routines).
> >>>
> >>> Show me the hierarchy here in out-line form then.
> >>
> >> Let's consider the most recent project I've worked on:
> >>
> >> Main
> >> Initialize_Common_Globals()
> >> Initialize_Program()
> >> Load_Program()
> >> LoadProperties()
> >> Get_Special_Target_List()
> >> LoadList(SpecialTargetList)
> >> OpenTable("[SpecialTargets]")
> >> <load loop>
> >> Close
> >> Load(ApplicationForm)
> >>
> >> Et cetera. Get the picture?
> >
> > That looks like an implementation of the event-handling engine,
> > not actual events themselves.
>
> What part of "my programs definately have high level routines and
> low level routines" did you fail to apprehend?

Remember the context is event-driven programs, not all programs.

Your argument here seems similar to the the DB index issue
here: the event engine my use trees underneath. I don't
dispute that, but the app developer doesn't give a flying
flip if it is built with a hidden tree or gerbiles underneath.

>
>
> >>> Lack of experience? I am a middle-aged developer. Started out on VAX's
> >>> and PRIME minicomputers.
> >>
> >> Time in the saddle doesn't necessarily translate to knowledge. You just
> >> don't talk like someone who really understands data structures or OOD or
> >> how polymorphism is used.
> >
> > I similarly feel you don't have experience with relational
> > and databases,...
>
> Despite that I've worked heavily with them for 20 years? (-:

Sometimes duration != insight

>
> > ...such as your complaint about auto-generated folder keys above.
>
> My complaint stands. It's a dumb idea.

Well, it is moot because one can still do relational
without auto-gen keys (even if it is a kiss of death
in the long run). "Dumb keys" are a useful tool,
but not required for relational. (Early RDBMS
did not even support them.)

>
>
> > (Some relational fans don't agree with auto-gen keys, but know that
> > named primary keys are also a possibility without giving it second
> > thought. I had to remind you.)
>
> I'm quite familiar with all this. What you don't seem to appreciate is
> that numerical identifiers are, in this case--to be blunt--stupid. No
> human will like them or use them. If you replace them with string keys
> (which, BTW, is my preference in DB *design* where possible--I only use
> auto-gen'd keys were absolutely necessary), then you're no different
> than using a file system path.

Attributes are more orthogonal than tree paths. Using the "cup"
example floating around on comp.object, one may start out with
an attribute/set of "coffee". Later one may realize they need
a broader category for the cup so they introduce a "hot drinks"
attribute/set for it. I don't have to touch any other attribute
to add that. However, such a change may require "restringing"
an entire tree taxonomy. Thus:

**********************************************************
* *
* Trees are overly sensative to early design decisions. *
* *
**********************************************************

It is tough to add new factors and remove old factors from
trees.

>
> And incidentally, I will give you this: a virtual file system ON TOP OF
> a regular one--a system that attaches searchable attributes and other
> meta-data to files--IS a cool thing.

Well, maybe that would be a good compromise so that we don't have
to fight anymore.

>
>
> --
> |_ CJSonnack <Chris(a)Sonnack.com> _____________| How's my programming? |

-T-

From: CBFalconer on 30 Jul 2005 05:23

topmind wrote:
>
.... 749 snipped lines ...

Which is way beyond my attention span. :-)

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson

From: topmind on 30 Jul 2005 18:19

CBFalconer wrote:
> topmind wrote:
> >
> ... 749 snipped lines ...
>
> Which is way beyond my attention span. :-)
>

Simple: just treat it as if it was three 250-line replies spread over a
week and deal with one-at-a-time.

-T-

First | Prev | Next | Last
Pages: 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
Next: Use Case Point Estimation