From: Leonid Shifrin on 3 Jul 2010 08:16 Hi, This is a follow-up to my previous post. I did not quite get your original goal before and my previous post was somewhat messy. Here is the code which will probably do what you need: The test list myList = RandomInteger[{1, 5555}, #] & /@ {{19808, 5}, {7952, 5}, {7952, 5}}; The code: Clear[mergeSameDates]; mergeSameDates[lst_List] := With[{pos = PositionsOfSame @@ lst[[All, All, 5]]}, With[{trpos = Transpose[pos]}, Flatten[ Transpose@ Table[lst[[i, #]] & /@ trpos[[i]], {i, 1, Length[lst]}], {{1}, {2, 3}}]]]; It uses the function PositionsOfSame, described in my previous post. You use it as follows: In[4]:= (res = mergeSameDates[myList]); // Timing Out[4]= {0.831, Null} Regards, Leonid On Thu, Jul 1, 2010 at 5:28 AM, Garapata <warsaw95826(a)mypacks.net> wrote: > I have a large nested list, "myList" > > It has 3 sublists with the following dimensions: > > Dimensions/@ myList > > {{19808, 5}, {7952, 5}, {7952, 5}} > > The 5th position (i.e., column) in each of the sublists has > SQLDateTime[]s > (This may or may not affect what I need, but I thought everyone should > know). > > myIntersection = Intersection @@ (myList[[All, All, 5]]); > > gives me the SQLDateTimes[]s common to all sublists. I get 3954 > common elements. > > Length[myIntersection] > > 3954 > > All of the above works great and runs very fast. > > I then find the positions in myList where all the common > SQLDateTimes[]s occur and then use Extract pull them out into a new > list > > myPositions = Drop[(Position[data, #] & /@ myIntersection), None, > None, -1]; > > myOutput = Extract[myList, #] & /@ myPositions; > > I end up with just what I need, which in this case gives me 3954 rows > of {9, 5} sublists. This occurs because myList[[1]] has 5 occurrences > of each common date element and sublists myList[[2]] and myList[[3]] > each have 2 occurrences of each common date element. > > The Extract[] runs very fast. > > My problem =85. the Position[] runs very very slow (over 90 seconds on a > dual core iMac). > > All the code together: > > myIntersection = Intersection @@ (myList[[All, All, 5]]); > myPositions = Drop[(Position[data, #] & /@ myIntersection), None, > None, -1]; > myOutput = Extract[myList, #] & /@ myPositions; > > So, does anyone know a way to speed up: > > myPositions = Drop[(Position[data, #] & /@ myIntersection), None, > None, -1]; ? > > Or can anyone suggest another approach to doing this that could run > faster. > > Patterns? > ParallelMap? > Parallelize? > Sorting? > Changing SQLDateTimes to DateList[]s before calculating myPositions? > > Not clear what to try. > Please advise. > > Thanks. > >
From: Leonid Shifrin on 3 Jul 2010 08:19 Hi, I used some problem-specific information (namely that your data - at least dates - are integers) to get another 2-3 times speed-up on your problem, with respect to my previous suggestion: Clear[extractPositionFromSparseArray]; extractPositionFromSparseArray[ HoldPattern[SparseArray[u___]]] := {u}[[4, 2, 2]] Clear[listSplit]; listSplit[x_List, lengths_List] := MapThread[Take[x, {##}] &, {Most[#], Rest[#] - 1}] &@ Accumulate[Prepend[lengths, 1]]; Clear[positionsOfSameInteger]; positionsOfSameInteger[x__List] := Module[{inter = Intersection[x], alloc = Table[0, {Max[Join[x]]}], pFun, plists, extracted, orders, frequencies, positions, allocAll}, allocAll = Array[0 &, {Length[{x}], Max[inter]}]; alloc[[inter]] = 1; plists = Flatten[extractPositionFromSparseArray[ SparseArray[alloc[[#]]]]] & /@ {x} ; extracted = MapThread[Part, {{x}, plists}]; orders = Ordering /@ extracted; frequencies = Tally /@ MapThread[Part, {extracted, orders}]; positions = MapThread[listSplit[#1[[#2]], #3[[All, 2]]] &, {plists, orders, frequencies}]; MapThread[(allocAll[[#1, #2[[All, 1]]]] = #3) &, {Range[Length[{x}]], frequencies, positions}]; Transpose[#[[inter]] & /@ allocAll]]; Clear[mergeSameDates]; mergeSameDates[lst_List] := With[{pos = positionsOfSameInteger @@ lst[[All, All, 5]]}, With[{trpos = Transpose[pos]}, Flatten[Transpose@ Table[lst[[i, #]] & /@ trpos[[i]], {i, 1, Length[lst]}], {{1}, {2, 3}}]]]; My benchmarks now show that it performs about twice faster than the fastest previous solution (provided by Peter Pein), at least on the test example below: In[39]= myList = RandomInteger[{1, 5555}, #] & /@ {{19808, 5}, {7952, 5}, {7952, 5}}; In[40]:= (res = mergeSameDates[myList]); // Timing Out[40]= {0.381, Null} In[42]:= Timing[ myIntersection = Intersection @@ (myList[[All, All, 5]]); Dimensions[ myOutput3 = Flatten[Reap[Map[Sow[#, Last[#]] &, myList, {2}], myIntersection][[2]], 1]]] Out[42]= {0.821, {3159}} In[43]:= res == myOutput3 Out[43]= True Peter's solution is however certainly more elegant and much more concise. OTOH, it does not give you the detailed information about positions while mine does, so they are complementary. A function positionsOfSameInteger is a byproduct of this problem, and can be for integer lists up to an order of magnitude faster than the more generic implementation I used in my package - it may be useful in many contexts. Anyway, hope this helps. Regards, Leonid On Thu, Jul 1, 2010 at 5:28 AM, Garapata <warsaw95826(a)mypacks.net> wrote: > I have a large nested list, "myList" > > It has 3 sublists with the following dimensions: > > Dimensions/@ myList > > {{19808, 5}, {7952, 5}, {7952, 5}} > > The 5th position (i.e., column) in each of the sublists has > SQLDateTime[]s > (This may or may not affect what I need, but I thought everyone should > know). > > myIntersection = Intersection @@ (myList[[All, All, 5]]); > > gives me the SQLDateTimes[]s common to all sublists. I get 3954 > common elements. > > Length[myIntersection] > > 3954 > > All of the above works great and runs very fast. > > I then find the positions in myList where all the common > SQLDateTimes[]s occur and then use Extract pull them out into a new > list > > myPositions = Drop[(Position[data, #] & /@ myIntersection), None, > None, -1]; > > myOutput = Extract[myList, #] & /@ myPositions; > > I end up with just what I need, which in this case gives me 3954 rows > of {9, 5} sublists. This occurs because myList[[1]] has 5 occurrences > of each common date element and sublists myList[[2]] and myList[[3]] > each have 2 occurrences of each common date element. > > The Extract[] runs very fast. > > My problem =85. the Position[] runs very very slow (over 90 seconds on a > dual core iMac). > > All the code together: > > myIntersection = Intersection @@ (myList[[All, All, 5]]); > myPositions = Drop[(Position[data, #] & /@ myIntersection), None, > None, -1]; > myOutput = Extract[myList, #] & /@ myPositions; > > So, does anyone know a way to speed up: > > myPositions = Drop[(Position[data, #] & /@ myIntersection), None, > None, -1]; ? > > Or can anyone suggest another approach to doing this that could run > faster. > > Patterns? > ParallelMap? > Parallelize? > Sorting? > Changing SQLDateTimes to DateList[]s before calculating myPositions? > > Not clear what to try. > Please advise. > > Thanks. > >
First
|
Prev
|
Pages: 1 2 Prev: Importing a file from computer's hard drive !! Next: Expression for manipulate controls |