Prev: thanks!!
Next: WHERE IN (SELECT) with multiple columns
From: obiron on 12 Nov 2009 13:09 Hi guys, first post here. I have the following problem I have records in a source table which I have pulled out to a temp table and built some indexes. As part of this extract I have ranked the records using the rank() over funcitonality to force a sequence key resetting on PATHWAY_ID. I am now trying to validate that the records have been entered in the correct order. For example. It is incorrect to have a main_code = 90 if you have not had a main_code = 30. The following code is correctly identifying every PATHWAY_ID where this is occuring select '90 with no 30' as 'error',* from #PATHWAY where Patnt_pathwauy_id in ( SELECT pathway_id from #PATHWAY X where main_code = 90 and Patnt_pathway_id not in ( SELECT pathway_id from #PATHWAY Y where Y.main_code = 30 and Y.rank < X. rank ) ) in English, get all the data where the main_code is 90 and there is not a record with a 30 with a higher ranked number. The problem is that there may be any number of 90s before the 30 and I only really need the inner query to continue processing if there is one and then it can bomb out. I know that once I have the results set I could SELECT DISTINCT but I am looking to save time by not processing every record. E.g. if the pathway is 10,20,90,90,90,30 then currently the middle select statement fires three times, but I only need it to fire once because I will get an error on the first 90 code. In the above example, I suppose I could use MIN() to find the first 90 and first 30 and only compare them. On some of the more complex validations though, a 10 woudl restart the sequence and so I would be looking for a 90 after a 10 but before a 30 (with any codes in between the 10 and 90 and any codes between the 90 and the 30) which would make life more difficult as I would have to find the smallest 30 that is larger than the largest 10 which is smaller than the 90.... The data set is 385000 records on 82365 pathways and it is therefore doing lots of nested loops which are unneccessary. This query runs in 46 minutes, returning 60200 rows and I have about 20 of these sequence checks to carry out, some of which may require 4 or 5 passes of the pathway and the current performance is not acceptable. I have already built indexes against the temp table based on pathway,rank, pathway,main_code and patway,main_code,rank. Can anyone suggest a way to speed this up, either with better indexes or a way of forcing the query to drop out as soon as the error is discovered on the pathway Thanks in advance Obiron Development environment is SQL2005
|
Pages: 1 Prev: thanks!! Next: WHERE IN (SELECT) with multiple columns |