Prev: Alter table
Next: SELECT SUBSTRING
From: Johnny Persson on 8 Mar 2010 04:37 Hi, we are having some performance issues regarding xml shredding. At this point we are extracting data from xmls from nearly 60 different companies - and therefore 60 different xml structures. The total amount of xml is about 350MB and we are trying to extract the data as fast as possible. Our current system extracts, transforms and loads the data in about five minutes. We would however like to do this in about one minute to be pleased. We use the "nodes/cross apply"-technique to shred the xmls into our internal format. This is how we shred the data. ------------------------------ 1) Load xml into a temporary table (#XmlTable) 2) Set an xml index 3) Query (like below) INSERT INTO #TransformedData SELECT T0.T.value('asasd', 'asdadd') T1.T.value('asasd', 'asdadd') FROM #XmlTable CROSS APPLY data.nodes('asd') AS T0(T) T0.T.nodes('level1') AS T1(T) DROP #XmlTable 4) Pass the temporary table #TransformedData into the common/shared transformation procedure EXEC LookupData ------------------------------- This is very I/O intensive and it makes the system slow. Are there any other good ways to parse the xmls in the sql server? Should we perhaps move the shredding outside the SQL environment into, for instance, a C# method which bulk loads the data? Regards, Johnny
From: Uri Dimant on 8 Mar 2010 05:06 Johny http://sqlblog.com/blogs/adam_machanic/archive/2010/01/12/t-sql-tuesday-002-is-it-xml-or-not.aspx http://sqlcat.com/msdnmirror/archive/2010/03/01/performance-tips-of-using-xml-data-in-sql-server.aspx "Johnny Persson" <a(a)a.a> wrote in message news:%23Ht3vLqvKHA.1796(a)TK2MSFTNGP02.phx.gbl... > Hi, > > we are having some performance issues regarding xml shredding. > > At this point we are extracting data from xmls from nearly 60 different > companies - and therefore 60 different xml structures. The total amount of > xml is about 350MB and we are trying to extract the data as fast as > possible. > > Our current system extracts, transforms and loads the data in about five > minutes. We would however like to do this in about one minute to be > pleased. > > We use the "nodes/cross apply"-technique to shred the xmls into our > internal format. > > This is how we shred the data. > ------------------------------ > > 1) Load xml into a temporary table (#XmlTable) > 2) Set an xml index > 3) Query (like below) > > INSERT INTO #TransformedData > SELECT > T0.T.value('asasd', 'asdadd') > T1.T.value('asasd', 'asdadd') > FROM > #XmlTable > CROSS APPLY > data.nodes('asd') AS T0(T) > T0.T.nodes('level1') AS T1(T) > > DROP #XmlTable > > 4) Pass the temporary table #TransformedData into the common/shared > transformation procedure > > EXEC LookupData > > ------------------------------- > > This is very I/O intensive and it makes the system slow. Are there any > other good ways to parse the xmls in the sql server? Should we perhaps > move the shredding outside the SQL environment into, for instance, a C# > method which bulk loads the data? > > Regards, > Johnny
From: Stefan Hoffmann on 8 Mar 2010 05:08 hi Johnny, On 08.03.2010 10:37, Johnny Persson wrote: > This is very I/O intensive and it makes the system slow. Are there any > other good ways to parse the xmls in the sql server? Should we perhaps > move the shredding outside the SQL environment into, for instance, a C# > method which bulk loads the data? You may take a look at 'Performing Bulk Load of XML Data (SQLXML 4.0)': http://technet.microsoft.com/en-us/library/ms171993.aspx Depending on your files, I would at least set some simple C# samples to compare performance. You may consider implementing it as CLR stored procedure if it performs better. http://technet.microsoft.com/en-us/library/ms131094%28SQL.90%29.aspx mfG --> stefan <--
From: Erland Sommarskog on 8 Mar 2010 06:34 Johnny Persson (a(a)a.a) writes: > 1) Load xml into a temporary table (#XmlTable) > 2) Set an xml index > 3) Query (like below) > > INSERT INTO #TransformedData > SELECT > T0.T.value('asasd', 'asdadd') > T1.T.value('asasd', 'asdadd') > FROM > #XmlTable > CROSS APPLY > data.nodes('asd') AS T0(T) > T0.T.nodes('level1') AS T1(T) Assuming these are top-level nodes, change this to data.nodes('/asd') AS T0(T) T0.T.nodes('/level1') AS T1(T) There can be huge performance gain by using correct node addressing. (Unfortunately, the syntax for the best addressing form can be quite convuluted in some cases.) -- Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
From: Johnny Persson on 8 Mar 2010 09:03
Thank you for your answer, we use XML as column data type so the first article does not seem to affect us. It is however interesting how the query optimizer work - or not work :) The second article had a really interesting part about typed/untyped xmls. We have had no xml schema for any xml so I tried that.. I created an xml schema for an xml file (~5MB) and compared the performance between a stored procedure which use the xml schema and a stored procedure which does not. The result is to me a bit strange. When we use a "typed xml column" the execution time is ~15s and the subtree cost is 428. When we use the normal, untyped xml column, the execution time is the same BUT the subtree cost is however 866! Do you have any thoughts about the result and why the execution time isn't affected? Regards, Johnny On 2010-03-08 11:06, Uri Dimant wrote: > Johny > > http://sqlblog.com/blogs/adam_machanic/archive/2010/01/12/t-sql-tuesday-002-is-it-xml-or-not.aspx > > > http://sqlcat.com/msdnmirror/archive/2010/03/01/performance-tips-of-using-xml-data-in-sql-server.aspx > > > > "Johnny Persson"<a(a)a.a> wrote in message > news:%23Ht3vLqvKHA.1796(a)TK2MSFTNGP02.phx.gbl... >> Hi, >> >> we are having some performance issues regarding xml shredding. >> >> At this point we are extracting data from xmls from nearly 60 different >> companies - and therefore 60 different xml structures. The total amount of >> xml is about 350MB and we are trying to extract the data as fast as >> possible. >> >> Our current system extracts, transforms and loads the data in about five >> minutes. We would however like to do this in about one minute to be >> pleased. >> >> We use the "nodes/cross apply"-technique to shred the xmls into our >> internal format. >> >> This is how we shred the data. >> ------------------------------ >> >> 1) Load xml into a temporary table (#XmlTable) >> 2) Set an xml index >> 3) Query (like below) >> >> INSERT INTO #TransformedData >> SELECT >> T0.T.value('asasd', 'asdadd') >> T1.T.value('asasd', 'asdadd') >> FROM >> #XmlTable >> CROSS APPLY >> data.nodes('asd') AS T0(T) >> T0.T.nodes('level1') AS T1(T) >> >> DROP #XmlTable >> >> 4) Pass the temporary table #TransformedData into the common/shared >> transformation procedure >> >> EXEC LookupData >> >> ------------------------------- >> >> This is very I/O intensive and it makes the system slow. Are there any >> other good ways to parse the xmls in the sql server? Should we perhaps >> move the shredding outside the SQL environment into, for instance, a C# >> method which bulk loads the data? >> >> Regards, >> Johnny > > |