From: SnapDive on 7 Dec 2009 11:30 Gurus, I have some book club categories in a text file. I need to fit them into an adjacency-model structured table. After playing around with a lot of procedural-class code, I would like to make this as set-based as possible. I don't know if it's better to send each line of the text file to a proc at a time, or send them all as a string and delimit by CRLF and then pipe, or what. I am stuck. All TSQL help appreciated, thanks! CREATE TABLE "#Cats" ("RowId" INTEGER PRIMARY KEY NOT NULL ,"Name" VARCHAR(768) ,"ContainerId" INT NULL DEFAULT 0 ) Using the first 4 lines from the text file, I would expect to have these rows: 0 Books Null 1 Fiction 0 2 Action 1 3 Drama 1 4 Education 1 5 Magazines Null 6 Science 5 Another tricky item (for me) is the level depth can vary up to 2 levels. Help! Thanks. Books | Fiction | Action Books | Fiction | Drama Books | Fiction | Education Magazines | Science Magazines | Sports Magazines | Teen Online | IT | Computer Science | Information Systems | IT General Online | IT | Computer Science | Networking | Security Online | IT | Computer Science | Project Management Online | IT | Software | Database | Microsoft Online | IT | Software | Database | Microsoft | SQL Server Online | IT | Software | Database | Microsoft | SQL Server | DML Manip Online | IT | Software | Database | Microsoft | SQL Server | DDL Manip
From: Plamen Ratchev on 7 Dec 2009 14:11 You can simply import the flat text file to a staging table (using BCP, SSIS, etc.), each line represented as one row. Then the following code will demonstrate how to process and parse the data to adjacency list relationships (requires SQL Server 2005/2008): CREATE TABLE Categories ( category_id INTEGER PRIMARY KEY NOT NULL, category_name VARCHAR(30), parent_category_id INT NULL REFERENCES Categories(category_id)); CREATE TABLE Staging ( keycol INT PRIMARY KEY, data VARCHAR(1000)); INSERT INTO Staging VALUES(1, 'Books | Fiction | Action'); INSERT INTO Staging VALUES(2, 'Books | Fiction | Drama'); INSERT INTO Staging VALUES(3, 'Books | Fiction | Education'); INSERT INTO Staging VALUES(4, 'Magazines | Science'); INSERT INTO Staging VALUES(5, 'Magazines | Sports'); INSERT INTO Staging VALUES(6, 'Magazines | Teen'); INSERT INTO Staging VALUES(7, 'Online | IT | Computer Science | Information Systems | IT General'); INSERT INTO Staging VALUES(8, 'Online | IT | Computer Science | Networking | Security'); INSERT INTO Staging VALUES(9, 'Online | IT | Computer Science | Project Management'); INSERT INTO Staging VALUES(10, 'Online | IT | Software | Database | Microsoft'); INSERT INTO Staging VALUES(11, 'Online | IT | Software | Database | Microsoft | SQL Server'); INSERT INTO Staging VALUES(12, 'Online | IT | Software | Database | Microsoft | SQL Server | DML Manip'); INSERT INTO Staging VALUES(13, 'Online | IT | Software | Database | Microsoft | SQL Server | DDL Manip'); WITH N1 (n) AS (SELECT 1 UNION ALL SELECT 1), N2 (n) AS (SELECT 1 FROM N1 AS X, N1 AS Y), N3 (n) AS (SELECT 1 FROM N2 AS X, N2 AS Y), N4 (n) AS (SELECT 1 FROM N3 AS X, N3 AS Y), Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) FROM N4), Split (keycol, category_name, idx) AS ( SELECT keycol, CAST( SUBSTRING(data, n, CHARINDEX(' | ', data + ' | ', n) - n) AS VARCHAR(30)), n + 1 - LEN(REPLACE(LEFT(data, n), ' | ', ' ')) FROM Nums JOIN Staging ON SUBSTRING(' | ' + data, n, 3) = ' | ' AND n < DATALENGTH(data + ' | ')), Ranked (keycol, category_name, idx, category_id) AS ( SELECT DISTINCT keycol, category_name, idx, CAST(DENSE_RANK() OVER(ORDER BY idx, category_name) AS INT) FROM Split), Hierarchy AS ( SELECT keycol, category_name, idx AS lvl, category_id, CAST(NULL AS INT) AS parent_category_id FROM Ranked WHERE idx = 1 UNION ALL SELECT S.keycol, S.category_name, S.idx, S.category_id, H.category_id FROM Ranked AS S JOIN Hierarchy AS H ON S.keycol = H.keycol AND S.idx = H.lvl + 1) INSERT INTO Categories (category_id, category_name, parent_category_id) SELECT DISTINCT MIN(category_id) OVER(PARTITION BY category_name), category_name, MIN(parent_category_id) OVER(PARTITION BY category_name) FROM Hierarchy; SELECT category_id, category_name, parent_category_id FROM Categories; /* category_id category_name parent_category_id ----------- ------------------------------ ------------------ 1 Books NULL 2 Magazines NULL 3 Online NULL 4 Fiction 1 5 IT 3 6 Science 2 7 Sports 2 8 Teen 2 9 Action 4 10 Computer Science 5 11 Drama 4 12 Education 4 13 Software 5 14 Database 13 15 Information Systems 10 16 Networking 10 17 Project Management 10 18 IT General 15 19 Microsoft 14 20 Security 16 21 SQL Server 19 22 DDL Manip 21 23 DML Manip 21 */ GO DROP TABLE Staging; DROP TABLE Categories; -- Plamen Ratchev http://www.SQLStudio.com
From: SnapDive on 7 Dec 2009 16:07 That was incredible and fast! I guess you had this in youg bag-of-tricks when someone else asked a similar question in the past? How does this compare to the "nested set" model for storing stuff? Thanks!
From: Plamen Ratchev on 7 Dec 2009 18:55 To compare different hierarchy models you have to define what types of manipulations you will perform with the data, what types of queries you will run, etc. Nested sets if a very good model but has very limited practical use. Vadim Tropashko has a very good comparison chart on the different models: http://vadimtropashko.wordpress.com/2008/08/09/one-more-nested-intervals-vs-adjacency-list-comparison/ -- Plamen Ratchev http://www.SQLStudio.com
From: Itzik Ben-Gan on 8 Dec 2009 15:07 Another one... WITH C1 AS ( SELECT DISTINCT SUBSTRING(string, n, CHARINDEX(' | ', string + ' | ', n) - n) AS cat, LEFT(string, CHARINDEX(' | ', string + ' | ', n) - 1) AS cat_full, LEFT(string, n - 1 - SIGN(n - 1) * 3) AS parent_cat_full FROM dbo.Import JOIN dbo.Nums ON n <= DATALENGTH(' | ' + string) - 2 AND SUBSTRING(' | ' + string, n, 3) = ' | ' ), C2 AS ( SELECT *, ROW_NUMBER() OVER(ORDER BY cat_full) - 1 AS id FROM C1 ) SELECT A.id, A.cat, B.id AS parent_id FROM C2 AS A LEFT OUTER JOIN C2 AS B ON A.parent_cat_full = B.cat_full ORDER BY A.cat_full; Assuming the file was loaded into a table called Import: SET NOCOUNT ON; USE tempdb; IF OBJECT_ID('dbo.Import', 'U') IS NOT NULL DROP TABLE dbo.Import; GO CREATE TABLE dbo.Import ( string VARCHAR(2000) NOT NULL ); INSERT INTO dbo.Import(string) VALUES ('Books | Fiction | Action'), ('Books | Fiction | Drama'), ('Books | Fiction | Education'), ('Magazines | Science'), ('Magazines | Sports'), ('Magazines | Teen'), ('Online | IT | Computer Science | Information Systems | IT General'), ('Online | IT | Computer Science | Networking | Security'), ('Online | IT | Computer Science | Project Management'), ('Online | IT | Software | Database | Microsoft'), ('Online | IT | Software | Database | Microsoft | SQL Server'), ('Online | IT | Software | Database | Microsoft | SQL Server | DML Manip'), ('Online | IT | Software | Database | Microsoft | SQL Server | DDL Manip'); And that you have an auxiliary table of numbers in the database: IF OBJECT_ID('dbo.Nums', 'U') IS NOT NULL DROP TABLE dbo.Nums; CREATE TABLE dbo.Nums(n INT NOT NULL PRIMARY KEY); DECLARE @max AS INT, @rc AS INT; SET @max = 8000; SET @rc = 1; INSERT INTO Nums VALUES(1); WHILE @rc * 2 <= @max BEGIN INSERT INTO dbo.Nums SELECT n + @rc FROM dbo.Nums; SET @rc = @rc * 2; END INSERT INTO dbo.Nums SELECT n + @rc FROM dbo.Nums WHERE n + @rc <= @max; Cheers, -- BG, SQL Server MVP www.SolidQ.com www.InsideTSQL.com "Plamen Ratchev" <Plamen(a)SQLStudio.com> wrote in message news:6YmdnchUadbnzoDWnZ2dnUVZ_oNi4p2d(a)speakeasy.net... > You can simply import the flat text file to a staging table (using BCP, > SSIS, etc.), each line represented as one row. > > Then the following code will demonstrate how to process and parse the data > to adjacency list relationships (requires SQL Server 2005/2008): > > CREATE TABLE Categories ( > category_id INTEGER PRIMARY KEY NOT NULL, > category_name VARCHAR(30), > parent_category_id INT NULL REFERENCES Categories(category_id)); > > CREATE TABLE Staging ( > keycol INT PRIMARY KEY, > data VARCHAR(1000)); > > INSERT INTO Staging VALUES(1, 'Books | Fiction | Action'); > INSERT INTO Staging VALUES(2, 'Books | Fiction | Drama'); > INSERT INTO Staging VALUES(3, 'Books | Fiction | Education'); > INSERT INTO Staging VALUES(4, 'Magazines | Science'); > INSERT INTO Staging VALUES(5, 'Magazines | Sports'); > INSERT INTO Staging VALUES(6, 'Magazines | Teen'); > INSERT INTO Staging VALUES(7, 'Online | IT | Computer Science | > Information Systems | IT General'); > INSERT INTO Staging VALUES(8, 'Online | IT | Computer Science | Networking > | Security'); > INSERT INTO Staging VALUES(9, 'Online | IT | Computer Science | Project > Management'); > INSERT INTO Staging VALUES(10, 'Online | IT | Software | Database | > Microsoft'); > INSERT INTO Staging VALUES(11, 'Online | IT | Software | Database | > Microsoft | SQL Server'); > INSERT INTO Staging VALUES(12, 'Online | IT | Software | Database | > Microsoft | SQL Server | DML Manip'); > INSERT INTO Staging VALUES(13, 'Online | IT | Software | Database | > Microsoft | SQL Server | DDL Manip'); > > WITH N1 (n) AS (SELECT 1 UNION ALL SELECT 1), > N2 (n) AS (SELECT 1 FROM N1 AS X, N1 AS Y), > N3 (n) AS (SELECT 1 FROM N2 AS X, N2 AS Y), > N4 (n) AS (SELECT 1 FROM N3 AS X, N3 AS Y), > Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) FROM N4), > Split (keycol, category_name, idx) AS ( > SELECT keycol, > CAST( > SUBSTRING(data, n, > CHARINDEX(' | ', > data + ' | ', > n) - n) > AS VARCHAR(30)), > n + 1 - LEN(REPLACE(LEFT(data, n), ' | ', ' ')) > FROM Nums > JOIN Staging > ON SUBSTRING(' | ' + data, n, 3) = ' | ' > AND n < DATALENGTH(data + ' | ')), > Ranked (keycol, category_name, idx, category_id) AS ( > SELECT DISTINCT > keycol, > category_name, idx, > CAST(DENSE_RANK() OVER(ORDER BY idx, category_name) AS INT) > FROM Split), > Hierarchy AS ( > SELECT keycol, category_name, idx AS lvl, > category_id, > CAST(NULL AS INT) AS parent_category_id > FROM Ranked > WHERE idx = 1 > UNION ALL > SELECT S.keycol, S.category_name, S.idx, > S.category_id, > H.category_id > FROM Ranked AS S > JOIN Hierarchy AS H > ON S.keycol = H.keycol > AND S.idx = H.lvl + 1) > INSERT INTO Categories (category_id, category_name, parent_category_id) > SELECT DISTINCT > MIN(category_id) OVER(PARTITION BY category_name), > category_name, > MIN(parent_category_id) OVER(PARTITION BY category_name) > FROM Hierarchy; > > SELECT category_id, category_name, parent_category_id > FROM Categories; > > /* > > category_id category_name parent_category_id > ----------- ------------------------------ ------------------ > 1 Books NULL > 2 Magazines NULL > 3 Online NULL > 4 Fiction 1 > 5 IT 3 > 6 Science 2 > 7 Sports 2 > 8 Teen 2 > 9 Action 4 > 10 Computer Science 5 > 11 Drama 4 > 12 Education 4 > 13 Software 5 > 14 Database 13 > 15 Information Systems 10 > 16 Networking 10 > 17 Project Management 10 > 18 IT General 15 > 19 Microsoft 14 > 20 Security 16 > 21 SQL Server 19 > 22 DDL Manip 21 > 23 DML Manip 21 > > */ > GO > > DROP TABLE Staging; > DROP TABLE Categories; > > -- > Plamen Ratchev > http://www.SQLStudio.com
|
Next
|
Last
Pages: 1 2 3 Prev: re sql result set where value increments Next: Different content types |