From: Garapata on 1 Jul 2010 08:28 I have a large nested list, "myList" It has 3 sublists with the following dimensions: Dimensions/@ myList {{19808, 5}, {7952, 5}, {7952, 5}} The 5th position (i.e., column) in each of the sublists has SQLDateTime[]s (This may or may not affect what I need, but I thought everyone should know). myIntersection = Intersection @@ (myList[[All, All, 5]]); gives me the SQLDateTimes[]s common to all sublists. I get 3954 common elements. Length[myIntersection] 3954 All of the above works great and runs very fast. I then find the positions in myList where all the common SQLDateTimes[]s occur and then use Extract pull them out into a new list myPositions = Drop[(Position[data, #] & /@ myIntersection), None, None, -1]; myOutput = Extract[myList, #] & /@ myPositions; I end up with just what I need, which in this case gives me 3954 rows of {9, 5} sublists. This occurs because myList[[1]] has 5 occurrences of each common date element and sublists myList[[2]] and myList[[3]] each have 2 occurrences of each common date element. The Extract[] runs very fast. My problem =85. the Position[] runs very very slow (over 90 seconds on a dual core iMac). All the code together: myIntersection = Intersection @@ (myList[[All, All, 5]]); myPositions = Drop[(Position[data, #] & /@ myIntersection), None, None, -1]; myOutput = Extract[myList, #] & /@ myPositions; So, does anyone know a way to speed up: myPositions = Drop[(Position[data, #] & /@ myIntersection), None, None, -1]; ? Or can anyone suggest another approach to doing this that could run faster. Patterns? ParallelMap? Parallelize? Sorting? Changing SQLDateTimes to DateList[]s before calculating myPositions? Not clear what to try. Please advise. Thanks.
From: Zach Bjornson on 2 Jul 2010 02:54 Hello, Without taking the time to fully understand your list structures, there are two easy ways to generally speed up Position. The first is to limit the search space, e.g. data[[All,1]] if all the things you are searching for are only in the first column (also limit the levelspec if you can). The second is to limit the number of hits (which it sounds like isn't an option for you), e.g. Position[data, #, Infinity, 1] to only find the first occurrence. On its own, Position is not parallelized (so your two cores don't really help). You can make it parallelized by putting Position into ParallelTable, and having the iterator for ParallelTable refer to the query that Position is searching for: myPositions = ParallelTable[Position[data, i],{i,myIntersection}] Hope that helps, Zach On 7/1/2010 8:28 AM, Garapata wrote: > I have a large nested list, "myList" > > It has 3 sublists with the following dimensions: > > Dimensions/@ myList > > {{19808, 5}, {7952, 5}, {7952, 5}} > > The 5th position (i.e., column) in each of the sublists has > SQLDateTime[]s > (This may or may not affect what I need, but I thought everyone should > know). > > myIntersection = Intersection @@ (myList[[All, All, 5]]); > > gives me the SQLDateTimes[]s common to all sublists. I get 3954 > common elements. > > Length[myIntersection] > > 3954 > > All of the above works great and runs very fast. > > I then find the positions in myList where all the common > SQLDateTimes[]s occur and then use Extract pull them out into a new > list > > myPositions = Drop[(Position[data, #]& /@ myIntersection), None, > None, -1]; > > myOutput = Extract[myList, #]& /@ myPositions; > > I end up with just what I need, which in this case gives me 3954 rows > of {9, 5} sublists. This occurs because myList[[1]] has 5 occurrences > of each common date element and sublists myList[[2]] and myList[[3]] > each have 2 occurrences of each common date element. > > The Extract[] runs very fast. > > My problem =85. the Position[] runs very very slow (over 90 seconds on a > dual core iMac). > > All the code together: > > myIntersection = Intersection @@ (myList[[All, All, 5]]); > myPositions = Drop[(Position[data, #]& /@ myIntersection), None, > None, -1]; > myOutput = Extract[myList, #]& /@ myPositions; > > So, does anyone know a way to speed up: > > myPositions = Drop[(Position[data, #]& /@ myIntersection), None, > None, -1]; ? > > Or can anyone suggest another approach to doing this that could run > faster. > > Patterns? > ParallelMap? > Parallelize? > Sorting? > Changing SQLDateTimes to DateList[]s before calculating myPositions? > > Not clear what to try. > Please advise. > > Thanks. > >
From: Peter Pein on 2 Jul 2010 02:56 Am Thu, 1 Jul 2010 12:28:11 +0000 (UTC) schrieb Garapata <warsaw95826(a)mypacks.net>: > I have a large nested list, "myList" > > It has 3 sublists with the following dimensions: > > Dimensions/@ myList > > {{19808, 5}, {7952, 5}, {7952, 5}} > > The 5th position (i.e., column) in each of the sublists has > SQLDateTime[]s > (This may or may not affect what I need, but I thought everyone should > know). > > myIntersection = Intersection @@ (myList[[All, All, 5]]); > > gives me the SQLDateTimes[]s common to all sublists. I get 3954 > common elements. > > Length[myIntersection] > > 3954 > > All of the above works great and runs very fast. > > I then find the positions in myList where all the common > SQLDateTimes[]s occur and then use Extract pull them out into a new > list > > myPositions = Drop[(Position[data, #] & /@ myIntersection), > None, None, -1]; > > myOutput = Extract[myList, #] & /@ myPositions; > > I end up with just what I need, which in this case gives me 3954 rows > of {9, 5} sublists. This occurs because myList[[1]] has 5 occurrences > of each common date element and sublists myList[[2]] and myList[[3]] > each have 2 occurrences of each common date element. > > The Extract[] runs very fast. > > My problem =85. the Position[] runs very very slow (over 90 seconds > on a dual core iMac). > > All the code together: > > myIntersection = Intersection @@ (myList[[All, All, 5]]); > myPositions = Drop[(Position[data, #] & /@ myIntersection), None, > None, -1]; > myOutput = Extract[myList, #] & /@ myPositions; > > So, does anyone know a way to speed up: > > myPositions = Drop[(Position[data, #] & /@ myIntersection), None, > None, -1]; ? > > Or can anyone suggest another approach to doing this that could run > faster. > > Patterns? > ParallelMap? > Parallelize? > Sorting? > Changing SQLDateTimes to DateList[]s before calculating myPositions? > > Not clear what to try. > Please advise. > > Thanks. > Hi, if you are interested in myOutput only and do not need to keep myPositions for later use, you can try something like: In[1]:= myList=RandomInteger[{1,5555},#]&/@{{19808,5},{7952,5},{7952,5}}; In[2]:= Length[myIntersection=Intersection@@myList[[All,All,5]]] Out[2]= 3126 In[3]:= Timing[Dimensions[ myOutput=Split[Sort[Cases[myList,{___,Alternatives@@myIntersection},{2}],Last[#1]<Last[#2]&],Last[#1]===Last[#2]&] ]] Out[3]= {11.28,{3126}} In[4]:= myOutput[[1]] Out[4]= {{3830,4047,4200,3520,1},{4788,4153,2710,2938,1},{886,2560,5266,128,1},{143,218,3189,3672,1},{190,510,4701,212,1}} Peter
From: Leonid Shifrin on 2 Jul 2010 07:26 Hi, I suggest you to use some of the functionality of the package UnsortedOperations which I developed some time ago, and which is available at: http://www.mathprogramming-intro.or/additional_resources.html 1. Download the package by right-clicking on link "The package" in the section "UnsortedOperations" and "Save as". 2. Evaluate $UserBaseDirectory in Mathematica to locate your application directory 3. Find the folder Applications in it, and place there the package. Then: This loads the package as a sub-context of a Global` context (ignore the error message) Needs["UnsortedOperations`"] Needs::nocont: Context UnsortedOperations` was not created when Needs was evaluated. >> You can call the Information (?) to see the name of the functions: ?`UnsortedOperations`* Global`UnsortedOperations` MapAtComplement MemberPositions PositionsOfDifferent UnsortedComplement MapAtIntersection MemberPositionsSequences PositionsOfSame UnsortedIntersection MapAtMembers MemberSequences Subsequences UnsortedUnion You will need the <PositionOfSame> function. You can type ?PositionsOfSame to get the usage message explaining what it does. Now: In[3]:= myList=RandomInteger[{1,5555},#]&/@{{19808,5},{7952,5},{7952,5}}; In[5]:= (pos = PositionsOfSame@@myList[[All,All,5]])//Short//Timing Out[5]= {0.991,{{{3940,5213,5845},{290,5492},{145,3142,5936,6987}},{{7151},{443},{7392}},<<3108>>,{{5697},{3910},{4057}}}} Each sublist contains lists of positions of some element in the first, second and third lists. There can be more than one identical element in each list, therefore position lists referring to each of the sublists in <myList> may have different lengths. Here we check for an arbitrary element (date) present in all lists that these positions are consistent: In[6]:= someElemPositions = RandomChoice[pos] Out[6]= {{224,1002,7634,18556},{4167,6157},{3662}} In[7]:= myList[[1,someElemPositions[[1]]]] Out[7]= {{4762,3324,804,1066,4935}, {4159,2940,4132,3993,4935},{2997,5423,228,2728,4935},{4389,1814,2097,890,4935}} In[8]:= myList[[2,someElemPositions[[2]]]] Out[8]= {{3717,635,5271,4565,4935},{1181,2864,3431,5117,4935}} In[9]:= myList[[3,someElemPositions[[3]]]] Out[9]= {{2026,4835,3932,3738,4935}} This will regroup positions according to the original order of elements in the list: In[12]:= (positionsAccordingToOrder = (Sort[Flatten[#]]&/@Transpose[pos]))//Shallow Out[12]//Shallow= {{1,2,4,5,6,9,10,11,12,13,<<11388>>},{1,3,6,8,9,10,13,14,15,16,<<5774>>},{1,2,3,4,5,6,8,9,10,11,<<5872>>}} This will extract from your list only parts which are present in all 3 lists. In[14]:= (listWithCOmmonElementsOnly = MapThread[ myList[[##]] &, {Range[Length[myList]], positionsAccordingToOrder }]) // Shallow Out[14]//Shallow= {{{<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, \ {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, <<11388>>}, {{<<5>>}, \ {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, \ {<<5>>}, {<<5>>}, <<5774>>}, {{<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, \ {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, <<5872>>}} This list has the same structure as your original list, with all elements not having counterparts with the same date in other lists removed. The number of elements in each of the 3 sublists can be large because of allowed repetitions of the last element (date). This list however loses the more precise information available in the original list of positions <pos>, where positions were grouped according to the same elements. In any case, you get a couple of orders of magnitude speed-up. It is in fact possible to rewrite the engine of my package so that another factor of 2-3 times speed-up is achieved, but I did not find the time to do that yet. Hope this helps. Regards, Leonid On Thu, Jul 1, 2010 at 5:28 AM, Garapata <warsaw95826(a)mypacks.net> wrote: > I have a large nested list, "myList" > > It has 3 sublists with the following dimensions: > > Dimensions/@ myList > > {{19808, 5}, {7952, 5}, {7952, 5}} > > The 5th position (i.e., column) in each of the sublists has > SQLDateTime[]s > (This may or may not affect what I need, but I thought everyone should > know). > > myIntersection = Intersection @@ (myList[[All, All, 5]]); > > gives me the SQLDateTimes[]s common to all sublists. I get 3954 > common elements. > > Length[myIntersection] > > 3954 > > All of the above works great and runs very fast. > > I then find the positions in myList where all the common > SQLDateTimes[]s occur and then use Extract pull them out into a new > list > > myPositions = Drop[(Position[data, #] & /@ myIntersection), None, > None, -1]; > > myOutput = Extract[myList, #] & /@ myPositions; > > I end up with just what I need, which in this case gives me 3954 rows > of {9, 5} sublists. This occurs because myList[[1]] has 5 occurrences > of each common date element and sublists myList[[2]] and myList[[3]] > each have 2 occurrences of each common date element. > > The Extract[] runs very fast. > > My problem =85. the Position[] runs very very slow (over 90 seconds on a > dual core iMac). > > All the code together: > > myIntersection = Intersection @@ (myList[[All, All, 5]]); > myPositions = Drop[(Position[data, #] & /@ myIntersection), None, > None, -1]; > myOutput = Extract[myList, #] & /@ myPositions; > > So, does anyone know a way to speed up: > > myPositions = Drop[(Position[data, #] & /@ myIntersection), None, > None, -1]; ? > > Or can anyone suggest another approach to doing this that could run > faster. > > Patterns? > ParallelMap? > Parallelize? > Sorting? > Changing SQLDateTimes to DateList[]s before calculating myPositions? > > Not clear what to try. > Please advise. > > Thanks. > >
From: Peter Pein on 2 Jul 2010 07:28 Am Fri, 2 Jul 2010 06:56:09 +0000 (UTC) schrieb Peter Pein <petsie(a)dordos.net>: > Am Thu, 1 Jul 2010 12:28:11 +0000 (UTC) > schrieb Garapata <warsaw95826(a)mypacks.net>: > .... > > > > My problem =85. the Position[] runs very very slow (over 90 seconds > > on a dual core iMac). > > > > All the code together: > > > > myIntersection = Intersection @@ (myList[[All, All, 5]]); > > myPositions = Drop[(Position[data, #] & /@ myIntersection), None, > > None, -1]; > > myOutput = Extract[myList, #] & /@ myPositions; > > > > So, does anyone know a way to speed up: > > > > myPositions = Drop[(Position[data, #] & /@ myIntersection), None, > > None, -1]; ? > > .... > > > > Not clear what to try. > > Please advise. > > > > Thanks. > > > > Hi, > .... > In[3]:= > Timing[Dimensions[ > myOutput=Split[Sort[Cases[myList,{___,Alternatives@@myIntersection},{2}],Last[#1]<Last[#2]&],Last[#1]===Last[#2]&] > ]] > Out[3]= {11.28,{3126}} > In[4]:= myOutput[[1]] > Out[4]= > {{3830,4047,4200,3520,1},{4788,4153,2710,2938,1},{886,2560,5266,128,1},{143,218,3189,3672,1},{190,510,4701,212,1}} .... I've got an even faster one: In[28]:= Timing[Dimensions[myOutput3=Flatten[Reap[Map[Sow[#,Last[#]]&,myList,{2}],myIntersection][[2]],1]]] Out[28]= {0.67,{3141}} In[29]:= Take[myOutput3,2] Out[29]= {{{2761,725,4865,2720,1},{4797,142,1312,1205,1},{1599,1513,498,2462,1},{4839,3373,3734,2125,1},{5277,1388,5042,1560,1},{3108,4445,2094,834,1}}, {{3852,3620,4120,3374,2},{1528,4708,573,2008,2},{877,2578,4421,5013,2},{143,3997,762,3376,2},{4783,2438,1249,934,2},{3639,3493,1495,2255,2}, {2423,4824,2705,80,2},{5079,2757,5297,2897,2},{1265,2193,2395,1409,2}} } hth, Peter
|
Next
|
Last
Pages: 1 2 Prev: Importing a file from computer's hard drive !! Next: Expression for manipulate controls |