From: George Neuner on
On Thu, 04 Mar 2010 18:51:21 +0200, Juan Pedro Bolivar Puente
<magnicida(a)gmail.com> wrote:

>On 04/03/10 16:21, ccc31807 wrote:
>> On Mar 3, 4:55 pm, toby <t...(a)telegraphics.com.au> wrote:
>>>> where you have to store data and
>>>
>>> "relational data"
>>
>> Data is neither relational nor unrelational. Data is data.
>> Relationships are an artifact, something we impose on the data.
>> Relations are for human convenience, not something inherent in the
>> data itself.
>>
>
>No, relations are data. "Data is data" says nothing. Data is
>information. Actually, all data are relations: relating /values/ to
>/properties/ of /entities/. Relations as understood by the "relational
>model" is nothing else but assuming that properties and entities are
>first class values of the data system and the can also be related.

Well ... sort of. Information is not data but rather the
understanding of something represented by the data. The term
"information overload" is counter-intuitive ... it really means an
excess of data for which there is little understanding.

Similarly, at the level to which you are referring, a relation is not
data but simply a theoretical construct. At this level testable
properties or instances of the relation are data, but the relation
itself is not. The relation may be data at a higher level.

George
From: ccc31807 on
On Mar 4, 11:51 am, Juan Pedro Bolivar Puente <magnic...(a)gmail.com>
wrote:
> No, relations are data.

This depends on your definition of 'data.' I would say that
relationships is information gleaned from the data.

> "Data is data" says nothing. Data is
> information.

To me, data and information are not the same thing, and in particular,
data is NOT information. To me, information consists of the sifting,
sorting, filtering, and rearrangement of data that can be useful in
completing some task. As an illustration, consider some very large
collection of ones and zeros -- the information it contains depends on
whether it's views as a JPEG, an EXE, XML, WAV, or other sort of
information processing device. Whichever way it's processed, the
'data' (the ones and zeros) stay the same, and do not constitute
'information' in their raw state.

> Actually, all data are relations: relating /values/ to
> /properties/ of /entities/. Relations as understood by the "relational
> model" is nothing else but assuming that properties and entities are
> first class values of the data system and the can also be related.

Well, this sort of illustrates my point. The 'values' of 'properties'
relating to specific 'entities' depends on how one processes the data,
which can be processed various ways. For example, 10000001 can either
be viewed as the decimal number 65 or the alpha character 'A' but the
decision as to how to view this value isn't inherent in the data
itself, but only as an artifact of our use of the data to turn it into
information.

CC.
From: Juan Pedro Bolivar Puente on
On 04/03/10 19:52, ccc31807 wrote:
> On Mar 4, 11:51 am, Juan Pedro Bolivar Puente <magnic...(a)gmail.com>
> wrote:
>> No, relations are data.
>
> This depends on your definition of 'data.' I would say that
> relationships is information gleaned from the data.
>
>> "Data is data" says nothing. Data is
>> information.
>
> To me, data and information are not the same thing, and in particular,
> data is NOT information. To me, information consists of the sifting,
> sorting, filtering, and rearrangement of data that can be useful in
> completing some task. As an illustration, consider some very large
> collection of ones and zeros -- the information it contains depends on
> whether it's views as a JPEG, an EXE, XML, WAV, or other sort of
> information processing device. Whichever way it's processed, the
> 'data' (the ones and zeros) stay the same, and do not constitute
> 'information' in their raw state.
>
>> Actually, all data are relations: relating /values/ to
>> /properties/ of /entities/. Relations as understood by the "relational
>> model" is nothing else but assuming that properties and entities are
>> first class values of the data system and the can also be related.
>
> Well, this sort of illustrates my point. The 'values' of 'properties'
> relating to specific 'entities' depends on how one processes the data,
> which can be processed various ways. For example, 10000001 can either
> be viewed as the decimal number 65 or the alpha character 'A' but the
> decision as to how to view this value isn't inherent in the data
> itself, but only as an artifact of our use of the data to turn it into
> information.
>

Well, it depends as you said on the definition of information; actually
your definition of data fits into the information-theorical definition
of information as sequence of symbols... But I understand that in other
context /information/ can also mean the next level of abstraction on top
of /data/, in the same way as /knowledge/ is the next level of
abstraction on top of information; lets ground or basis on that.

In any case, your definition and George's still support my point of view
where relations are data: they are stored in the computer as a sequence
of ones and zeroes and is indistinguishable from any other thing in the
data space in that sense. Of course, it is a key data to be able to
recover information and specially to add new information consistently to
the data storage... That SQL includes special syntax for manipulating
relations should not hide this fact; and one can still query the
relational information in the same way one would query non-relational
data in most DBMS anyway...

Anyway I'm sorry for drifting the conversation away... Going back to the
main topic, I agree with the general view on this thread that relational
databases (information-bases ? ;) and non-relational ones are there to
do some different jobs. It is just by carefully examining a problem that
we can define which one fits it better; with relational databases having
the clear advantage that is mathematically grounded basis makes its
fitness for most problems quite clear, while the preference for
non-relational systems is a more technical and empirical problem of the
trade-offs of consistency vs scalability and so on.

JP

From: Xah Lee on
many people mentioned scalibility... though i think it is fruitful to
talk about at what size is the NoSQL databases offer better
scalability than SQL databases.

For example, consider, if you are within world's top 100th user of
database in terms of database size, such as Google, then it may be
that the off-the-shelf tools may be limiting. But how many users
really have such massive size of data?

note that google's need for database today isn't just a seach engine.
It's db size for google search is probably larger than all the rest of
search engine company's sizes combined. Plus, there's youtube (vid
hosting), gmail, google code (source code hosting), google blog, orkut
(social networking), picasa (photo hosting), etc, each are all ranked
within top 5 or so with respective competitors in terms of number of
accounts... so, google's datasize is probably number one among the
world's user of databases, probably double or triple than the second
user with the most large datasize. At that point, it seems logical
that they need their own db, relational or not.

Xah
∑ http://xahlee.org/

☄

On Mar 4, 10:35 pm, John Nagle <na...(a)animats.com> wrote:
> Xah Lee wrote:
> > recently i wrote a blog article on The NoSQL Movement
> > athttp://xahlee.org/comp/nosql.html
>
> > i'd like to post it somewhere public to solicit opinions, but in the
> > 20 min or so, i couldn't find a proper newsgroup, nor private list
> > that my somewhat anti-NoSQL Movement article is fitting.
>
>     Too much rant, not enough information.
>
>     There is an argument against using full relational databases for
> some large-scale applications, ones where the database is spread over
> many machines.  If the database can be organized so that each transaction
> only needs to talk to one database machine, the locking problems become
> much simpler.  That's what BigTable is really about.
>
>     For many web applications, each user has more or less their own data,
> and most database activity is related to a single user.  Such
> applications can easily be scaled up with a system that doesn't
> have inter-user links.  There can still be inter-user references,
> but without a consistency guarantee.  They may lead to dead data,
> like Unix/Linux symbolic links.  This is a mechanism adequate
> for most "social networking" sites.
>
>     There are also some "consistent-eventually" systems, where a query
> can see old data.  For non-critical applications, those can be
> very useful.  This isn't a SQL/NoSQL thing; MySQL asynchronous
> replication is a "consistent-eventually" system.  Wikipedia uses
> that for the "special" pages which require database lookups.
>
>     If you allow general joins across any tables, you have to have all
> the very elaborate interlocking mechanisms of a distributed database.
> The serious database systems (MySQL Cluster and Oracle, for example)
> do offer that, but there are usually
> substantial complexity penalties, and the databases have to be carefully
> organized to avoid excessive cross-machine locking.  If you don't need
> general joins, a system which doesn't support them is far simpler.
>
>                                         John Nagle

From: Duncan Booth on
Xah Lee <xahlee(a)gmail.com> wrote:

> For example, consider, if you are within world's top 100th user of
> database in terms of database size, such as Google, then it may be
> that the off-the-shelf tools may be limiting. But how many users
> really have such massive size of data?

You've totally missed the point. It isn't the size of the data you have
today that matters, it's the size of data you could have in several years'
time.

Maybe today you've got 10 users each with 10 megabytes of data, but you're
aspiring to become the next twitter/facebook or whatever. It's a bit late
as you approach 100 million users (and a petabyte of data) to discover that
your system isn't scalable: scalability needs to be built in from day one.