Create Adjancency Model From Flat Data? [Programming SQL]

Prev: re sql result set where value increments
Next: Different content types

From: m on 8 Dec 2009 19:56

IMHO, do not sent them as a delimited string if at all possible. If
performance is an issue, then use a bulk upload technique to load the data
to a staging table and execute a transform from there; and if not, then
just use single insert statements / stored procedures.

As far as the schema goes, an optimal design depends on the relative
distribution of items and categories. For many categories with few items
each (likely not your case), a recursive foreign key is usually efficient -
especially if a fixed depth is defined. But for a few categories with may
items, the most efficient and sane aggregates are usually formed by a
subordinate index on a table - or if the only purpose of the system is to
compute these values and maintain them, a stats table updated by trigger.

"SnapDive" <SnapDive(a)community.nospam> wrote in message
news:q8aqh5hr0c6i5jjjf95pr6kovb9ou719pe(a)4ax.com...
>
>
>
> Gurus, I have some book club categories in a text file. I need to fit
> them into an adjacency-model structured table. After playing around
> with a lot of procedural-class code, I would like to make this as
> set-based as possible. I don't know if it's better to send each line
> of the text file to a proc at a time, or send them all as a string and
> delimit by CRLF and then pipe, or what. I am stuck. All TSQL help
> appreciated, thanks!
>
> CREATE TABLE "#Cats" ("RowId" INTEGER PRIMARY KEY NOT NULL ,"Name"
> VARCHAR(768) ,"ContainerId" INT NULL DEFAULT 0 )
>
> Using the first 4 lines from the text file, I would expect to have
> these rows:
>
> 0 Books Null
> 1 Fiction 0
> 2 Action 1
> 3 Drama 1
> 4 Education 1
> 5 Magazines Null
> 6 Science 5
>
> Another tricky item (for me) is the level depth can vary up to 2
> levels.
>
> Help!
>
>
> Thanks.
>
>
> Books | Fiction | Action
> Books | Fiction | Drama
> Books | Fiction | Education
> Magazines | Science
> Magazines | Sports
> Magazines | Teen
> Online | IT | Computer Science | Information Systems | IT General
> Online | IT | Computer Science | Networking | Security
> Online | IT | Computer Science | Project Management
> Online | IT | Software | Database | Microsoft
> Online | IT | Software | Database | Microsoft | SQL Server
> Online | IT | Software | Database | Microsoft | SQL Server | DML Manip
> Online | IT | Software | Database | Microsoft | SQL Server | DDL Manip
>
>

From: Mark Hickin on 9 Dec 2009 11:13

Hi,

If this is a one off operation, I would suggest loading your data into a
table using the import wizard. (right click on your database - tasks - import
data) This only takes a few seconds, and is the fastest way to load larger
data sets. There are so many other ways of getting the data in, (using VB,
XML, BCP, Linked Servers etc etc) it's hardly worth going into them all.

Once the data is in, I'd sugest you take a look at recursive CTE queries. If
you don't have too much data, these are an effective way to query hierarchies
such as these.

http://code.msdn.microsoft.com/SQLExamples/Wiki/View.aspx?title=RecursiveCTE

Mark

"SnapDive" wrote:

>
>
>
> Gurus, I have some book club categories in a text file. I need to fit
> them into an adjacency-model structured table. After playing around
> with a lot of procedural-class code, I would like to make this as
> set-based as possible. I don't know if it's better to send each line
> of the text file to a proc at a time, or send them all as a string and
> delimit by CRLF and then pipe, or what. I am stuck. All TSQL help
> appreciated, thanks!
>
> CREATE TABLE "#Cats" ("RowId" INTEGER PRIMARY KEY NOT NULL ,"Name"
> VARCHAR(768) ,"ContainerId" INT NULL DEFAULT 0 )
>
> Using the first 4 lines from the text file, I would expect to have
> these rows:
>
> 0 Books Null
> 1 Fiction 0
> 2 Action 1
> 3 Drama 1
> 4 Education 1
> 5 Magazines Null
> 6 Science 5
>
> Another tricky item (for me) is the level depth can vary up to 2
> levels.
>
> Help!
>
>
> Thanks.
>
>
> Books | Fiction | Action
> Books | Fiction | Drama
> Books | Fiction | Education
> Magazines | Science
> Magazines | Sports
> Magazines | Teen
> Online | IT | Computer Science | Information Systems | IT General
> Online | IT | Computer Science | Networking | Security
> Online | IT | Computer Science | Project Management
> Online | IT | Software | Database | Microsoft
> Online | IT | Software | Database | Microsoft | SQL Server
> Online | IT | Software | Database | Microsoft | SQL Server | DML Manip
> Online | IT | Software | Database | Microsoft | SQL Server | DDL Manip
>
>
> .
>

From: SnapDive on 11 Dec 2009 23:22

After reading General Celko's words on Nested Sets (and from other web
sites), it appears that Nested Sets will indeed work better for me.
However, loading them into the table in the first place is still a
problem. Ant thoughts on the load operation would be appreciated.

Thanks.

From: --CELKO-- on 12 Dec 2009 09:53

>> After reading General Celko's words on Nested Sets (and from other web sites), it appears that Nested Sets will indeed work better for me. However, loading them into the table in the first place is still a problem. Ant thoughts on the load operation would be appreciated. <<

LOL! General Celko? We prefer "All Wise, All Knowing Guru" instead :)

Seriously, loading a nested sets table from adjacency list or path
enumeration models is hard. The reason is that most of the time those
tables have errors in their data. Cowboy coders will not put the
constraints needed on those models. It is easy to check hat an
adjacency list table has one row with a NULL. Now make sure all paths
lead back to it. Now make sure you have no cycles -- most cowboys do
not even bother with a simple "CHECK(boss_emp_id <> emp_id)" on the
table, much less the looping cursor in a trigger needed to find a
general cycle of any length.

Right now I am working a guy who has adjacency list data where the
nodes are email addresses. But there are gaps and cycles in it, so it
is not a tree at all. When he tries to use the classic stack/
recursive pre-order traversal algorithm, he gets garbage back. I get
an email about this stuff every 1-2 years.

The solution I found was to start a nested set tree from the data
small enough you can verify it as correct manually. Create the table
with a high percentage of free space per page then add the other
adjacency list edges one at a time, checking them for validity as you
go. Slow and painful as hell, but I believe than getting it right is
more important than getting it fast.

When you stall out, then make decision about the orphans. Since
constraints are easy to add to the nested sets model, once it is in
place, the SQL engine will maintain data integrity for you.

You get the same problems with path enumerations, just more loops and
parsing.

From: Plamen Ratchev on 12 Dec 2009 10:46

Since you already have two solutions for importing the data to adjacency list, you can use that to import the data and
then convert to nested sets. This is fairly easy and will be a good exercise to learn more about the nested sets model.

I would suggest to spend some time researching more and playing with nested sets before designing a production solution.
You may find the difficulties with maintaining a nested sets hierarchy is not worth it. An interesting read are the
following articles by Vadim Tropashko (see the nested intervals model):
http://www.dbazine.com/oracle/or-articles/tropashko4
http://www.dbazine.com/oracle/or-articles/tropashko5

--
Plamen Ratchev
http://www.SQLStudio.com

First | Prev | Next | Last
Pages: 1 2 3
Prev: re sql result set where value increments
Next: Different content types