From: Garapata on
I have a large nested list, "myList"

It has 3 sublists with the following dimensions:

Dimensions/@ myList

{{19808, 5}, {7952, 5}, {7952, 5}}

The 5th position (i.e., column) in each of the sublists has
SQLDateTime[]s
(This may or may not affect what I need, but I thought everyone should
know).

myIntersection = Intersection @@ (myList[[All, All, 5]]);

gives me the SQLDateTimes[]s common to all sublists. I get 3954
common elements.

Length[myIntersection]

3954

All of the above works great and runs very fast.

I then find the positions in myList where all the common
SQLDateTimes[]s occur and then use Extract pull them out into a new
list

myPositions = Drop[(Position[data, #] & /@ myIntersection), None,
None, -1];

myOutput = Extract[myList, #] & /@ myPositions;

I end up with just what I need, which in this case gives me 3954 rows
of {9, 5} sublists. This occurs because myList[[1]] has 5 occurrences
of each common date element and sublists myList[[2]] and myList[[3]]
each have 2 occurrences of each common date element.

The Extract[] runs very fast.

My problem =85. the Position[] runs very very slow (over 90 seconds on a
dual core iMac).

All the code together:

myIntersection = Intersection @@ (myList[[All, All, 5]]);
myPositions = Drop[(Position[data, #] & /@ myIntersection), None,
None, -1];
myOutput = Extract[myList, #] & /@ myPositions;

So, does anyone know a way to speed up:

myPositions = Drop[(Position[data, #] & /@ myIntersection), None,
None, -1]; ?

Or can anyone suggest another approach to doing this that could run
faster.

Patterns?
ParallelMap?
Parallelize?
Sorting?
Changing SQLDateTimes to DateList[]s before calculating myPositions?

Not clear what to try.
Please advise.

Thanks.

From: Zach Bjornson on
Hello,

Without taking the time to fully understand your list structures, there
are two easy ways to generally speed up Position. The first is to limit
the search space, e.g. data[[All,1]] if all the things you are searching
for are only in the first column (also limit the levelspec if you can).
The second is to limit the number of hits (which it sounds like isn't an
option for you), e.g. Position[data, #, Infinity, 1] to only find the
first occurrence.

On its own, Position is not parallelized (so your two cores don't really
help). You can make it parallelized by putting Position into
ParallelTable, and having the iterator for ParallelTable refer to the
query that Position is searching for:
myPositions = ParallelTable[Position[data, i],{i,myIntersection}]

Hope that helps,
Zach

On 7/1/2010 8:28 AM, Garapata wrote:
> I have a large nested list, "myList"
>
> It has 3 sublists with the following dimensions:
>
> Dimensions/@ myList
>
> {{19808, 5}, {7952, 5}, {7952, 5}}
>
> The 5th position (i.e., column) in each of the sublists has
> SQLDateTime[]s
> (This may or may not affect what I need, but I thought everyone should
> know).
>
> myIntersection = Intersection @@ (myList[[All, All, 5]]);
>
> gives me the SQLDateTimes[]s common to all sublists. I get 3954
> common elements.
>
> Length[myIntersection]
>
> 3954
>
> All of the above works great and runs very fast.
>
> I then find the positions in myList where all the common
> SQLDateTimes[]s occur and then use Extract pull them out into a new
> list
>
> myPositions = Drop[(Position[data, #]& /@ myIntersection), None,
> None, -1];
>
> myOutput = Extract[myList, #]& /@ myPositions;
>
> I end up with just what I need, which in this case gives me 3954 rows
> of {9, 5} sublists. This occurs because myList[[1]] has 5 occurrences
> of each common date element and sublists myList[[2]] and myList[[3]]
> each have 2 occurrences of each common date element.
>
> The Extract[] runs very fast.
>
> My problem =85. the Position[] runs very very slow (over 90 seconds on a
> dual core iMac).
>
> All the code together:
>
> myIntersection = Intersection @@ (myList[[All, All, 5]]);
> myPositions = Drop[(Position[data, #]& /@ myIntersection), None,
> None, -1];
> myOutput = Extract[myList, #]& /@ myPositions;
>
> So, does anyone know a way to speed up:
>
> myPositions = Drop[(Position[data, #]& /@ myIntersection), None,
> None, -1]; ?
>
> Or can anyone suggest another approach to doing this that could run
> faster.
>
> Patterns?
> ParallelMap?
> Parallelize?
> Sorting?
> Changing SQLDateTimes to DateList[]s before calculating myPositions?
>
> Not clear what to try.
> Please advise.
>
> Thanks.
>
>

From: Peter Pein on
Am Thu, 1 Jul 2010 12:28:11 +0000 (UTC)
schrieb Garapata <warsaw95826(a)mypacks.net>:

> I have a large nested list, "myList"
>
> It has 3 sublists with the following dimensions:
>
> Dimensions/@ myList
>
> {{19808, 5}, {7952, 5}, {7952, 5}}
>
> The 5th position (i.e., column) in each of the sublists has
> SQLDateTime[]s
> (This may or may not affect what I need, but I thought everyone should
> know).
>
> myIntersection = Intersection @@ (myList[[All, All, 5]]);
>
> gives me the SQLDateTimes[]s common to all sublists. I get 3954
> common elements.
>
> Length[myIntersection]
>
> 3954
>
> All of the above works great and runs very fast.
>
> I then find the positions in myList where all the common
> SQLDateTimes[]s occur and then use Extract pull them out into a new
> list
>
> myPositions = Drop[(Position[data, #] & /@ myIntersection),
> None, None, -1];
>
> myOutput = Extract[myList, #] & /@ myPositions;
>
> I end up with just what I need, which in this case gives me 3954 rows
> of {9, 5} sublists. This occurs because myList[[1]] has 5 occurrences
> of each common date element and sublists myList[[2]] and myList[[3]]
> each have 2 occurrences of each common date element.
>
> The Extract[] runs very fast.
>
> My problem =85. the Position[] runs very very slow (over 90 seconds
> on a dual core iMac).
>
> All the code together:
>
> myIntersection = Intersection @@ (myList[[All, All, 5]]);
> myPositions = Drop[(Position[data, #] & /@ myIntersection), None,
> None, -1];
> myOutput = Extract[myList, #] & /@ myPositions;
>
> So, does anyone know a way to speed up:
>
> myPositions = Drop[(Position[data, #] & /@ myIntersection), None,
> None, -1]; ?
>
> Or can anyone suggest another approach to doing this that could run
> faster.
>
> Patterns?
> ParallelMap?
> Parallelize?
> Sorting?
> Changing SQLDateTimes to DateList[]s before calculating myPositions?
>
> Not clear what to try.
> Please advise.
>
> Thanks.
>

Hi,

if you are interested in myOutput only and do not need to keep
myPositions for later use, you can try something like:

In[1]:=
myList=RandomInteger[{1,5555},#]&/@{{19808,5},{7952,5},{7952,5}};

In[2]:= Length[myIntersection=Intersection@@myList[[All,All,5]]]
Out[2]= 3126
In[3]:=
Timing[Dimensions[
myOutput=Split[Sort[Cases[myList,{___,Alternatives@@myIntersection},{2}],Last[#1]<Last[#2]&],Last[#1]===Last[#2]&]
]]
Out[3]= {11.28,{3126}}
In[4]:= myOutput[[1]]
Out[4]=
{{3830,4047,4200,3520,1},{4788,4153,2710,2938,1},{886,2560,5266,128,1},{143,218,3189,3672,1},{190,510,4701,212,1}}


Peter


From: Leonid Shifrin on
Hi,

I suggest you to use some of the functionality of the package
UnsortedOperations which I developed some time ago, and which is available
at:

http://www.mathprogramming-intro.or/additional_resources.html

1. Download the package by right-clicking on link "The package" in the
section "UnsortedOperations" and "Save as".

2. Evaluate $UserBaseDirectory in Mathematica to locate your application
directory

3. Find the folder Applications in it, and place there the package.

Then:

This loads the package as a sub-context of a Global` context (ignore the
error message)

Needs["UnsortedOperations`"]

Needs::nocont: Context UnsortedOperations` was not created when Needs was
evaluated. >>

You can call the Information (?) to see the name of the functions:

?`UnsortedOperations`*
Global`UnsortedOperations`
MapAtComplement MemberPositions PositionsOfDifferent
UnsortedComplement
MapAtIntersection MemberPositionsSequences PositionsOfSame
UnsortedIntersection
MapAtMembers MemberSequences Subsequences UnsortedUnion

You will need the <PositionOfSame> function. You can type ?PositionsOfSame
to get the usage message explaining what it does. Now:

In[3]:=
myList=RandomInteger[{1,5555},#]&/@{{19808,5},{7952,5},{7952,5}};

In[5]:= (pos = PositionsOfSame@@myList[[All,All,5]])//Short//Timing

Out[5]=
{0.991,{{{3940,5213,5845},{290,5492},{145,3142,5936,6987}},{{7151},{443},{7392}},<<3108>>,{{5697},{3910},{4057}}}}

Each sublist contains lists of positions of some element in the first,
second and third lists. There can be more than one identical element in each
list, therefore position lists referring to each of the sublists in <myList>
may have different lengths. Here we check for an arbitrary element (date)
present in all lists that these positions are consistent:

In[6]:=
someElemPositions = RandomChoice[pos]

Out[6]= {{224,1002,7634,18556},{4167,6157},{3662}}

In[7]:= myList[[1,someElemPositions[[1]]]]

Out[7]= {{4762,3324,804,1066,4935},
{4159,2940,4132,3993,4935},{2997,5423,228,2728,4935},{4389,1814,2097,890,4935}}

In[8]:= myList[[2,someElemPositions[[2]]]]

Out[8]= {{3717,635,5271,4565,4935},{1181,2864,3431,5117,4935}}

In[9]:= myList[[3,someElemPositions[[3]]]]

Out[9]= {{2026,4835,3932,3738,4935}}

This will regroup positions according to the original order of elements in
the list:

In[12]:= (positionsAccordingToOrder =
(Sort[Flatten[#]]&/@Transpose[pos]))//Shallow

Out[12]//Shallow=
{{1,2,4,5,6,9,10,11,12,13,<<11388>>},{1,3,6,8,9,10,13,14,15,16,<<5774>>},{1,2,3,4,5,6,8,9,10,11,<<5872>>}}

This will extract from your list only parts which are present in all 3
lists.

In[14]:= (listWithCOmmonElementsOnly =
MapThread[
myList[[##]] &, {Range[Length[myList]],
positionsAccordingToOrder }]) // Shallow

Out[14]//Shallow= {{{<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, \
{<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, <<11388>>}, {{<<5>>}, \
{<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, \
{<<5>>}, {<<5>>}, <<5774>>}, {{<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, \
{<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, {<<5>>}, <<5872>>}}

This list has the same structure as your original list, with all elements
not having counterparts with the same date in other lists removed. The
number of elements in each of the 3 sublists can be large because of allowed
repetitions of the last element (date). This list however loses the more
precise information available in the original list of positions <pos>, where
positions were grouped according to the same elements.

In any case, you get a couple of orders of magnitude speed-up. It is in fact
possible to rewrite the engine of my package so that another factor of 2-3
times speed-up is achieved, but I did not find the time to do that yet.

Hope this helps.

Regards,
Leonid





On Thu, Jul 1, 2010 at 5:28 AM, Garapata <warsaw95826(a)mypacks.net> wrote:

> I have a large nested list, "myList"
>
> It has 3 sublists with the following dimensions:
>
> Dimensions/@ myList
>
> {{19808, 5}, {7952, 5}, {7952, 5}}
>
> The 5th position (i.e., column) in each of the sublists has
> SQLDateTime[]s
> (This may or may not affect what I need, but I thought everyone should
> know).
>
> myIntersection = Intersection @@ (myList[[All, All, 5]]);
>
> gives me the SQLDateTimes[]s common to all sublists. I get 3954
> common elements.
>
> Length[myIntersection]
>
> 3954
>
> All of the above works great and runs very fast.
>
> I then find the positions in myList where all the common
> SQLDateTimes[]s occur and then use Extract pull them out into a new
> list
>
> myPositions = Drop[(Position[data, #] & /@ myIntersection), None,
> None, -1];
>
> myOutput = Extract[myList, #] & /@ myPositions;
>
> I end up with just what I need, which in this case gives me 3954 rows
> of {9, 5} sublists. This occurs because myList[[1]] has 5 occurrences
> of each common date element and sublists myList[[2]] and myList[[3]]
> each have 2 occurrences of each common date element.
>
> The Extract[] runs very fast.
>
> My problem =85. the Position[] runs very very slow (over 90 seconds on a
> dual core iMac).
>
> All the code together:
>
> myIntersection = Intersection @@ (myList[[All, All, 5]]);
> myPositions = Drop[(Position[data, #] & /@ myIntersection), None,
> None, -1];
> myOutput = Extract[myList, #] & /@ myPositions;
>
> So, does anyone know a way to speed up:
>
> myPositions = Drop[(Position[data, #] & /@ myIntersection), None,
> None, -1]; ?
>
> Or can anyone suggest another approach to doing this that could run
> faster.
>
> Patterns?
> ParallelMap?
> Parallelize?
> Sorting?
> Changing SQLDateTimes to DateList[]s before calculating myPositions?
>
> Not clear what to try.
> Please advise.
>
> Thanks.
>
>
From: Peter Pein on
Am Fri, 2 Jul 2010 06:56:09 +0000 (UTC)
schrieb Peter Pein <petsie(a)dordos.net>:

> Am Thu, 1 Jul 2010 12:28:11 +0000 (UTC)
> schrieb Garapata <warsaw95826(a)mypacks.net>:
>
....
> >
> > My problem =85. the Position[] runs very very slow (over 90 seconds
> > on a dual core iMac).
> >
> > All the code together:
> >
> > myIntersection = Intersection @@ (myList[[All, All, 5]]);
> > myPositions = Drop[(Position[data, #] & /@ myIntersection), None,
> > None, -1];
> > myOutput = Extract[myList, #] & /@ myPositions;
> >
> > So, does anyone know a way to speed up:
> >
> > myPositions = Drop[(Position[data, #] & /@ myIntersection), None,
> > None, -1]; ?
> >
....
> >
> > Not clear what to try.
> > Please advise.
> >
> > Thanks.
> >
>
> Hi,
>
....
> In[3]:=
> Timing[Dimensions[
> myOutput=Split[Sort[Cases[myList,{___,Alternatives@@myIntersection},{2}],Last[#1]<Last[#2]&],Last[#1]===Last[#2]&]
> ]]
> Out[3]= {11.28,{3126}}
> In[4]:= myOutput[[1]]
> Out[4]=
> {{3830,4047,4200,3520,1},{4788,4153,2710,2938,1},{886,2560,5266,128,1},{143,218,3189,3672,1},{190,510,4701,212,1}}
....

I've got an even faster one:

In[28]:=
Timing[Dimensions[myOutput3=Flatten[Reap[Map[Sow[#,Last[#]]&,myList,{2}],myIntersection][[2]],1]]]
Out[28]= {0.67,{3141}}
In[29]:= Take[myOutput3,2]
Out[29]=
{{{2761,725,4865,2720,1},{4797,142,1312,1205,1},{1599,1513,498,2462,1},{4839,3373,3734,2125,1},{5277,1388,5042,1560,1},{3108,4445,2094,834,1}},
{{3852,3620,4120,3374,2},{1528,4708,573,2008,2},{877,2578,4421,5013,2},{143,3997,762,3376,2},{4783,2438,1249,934,2},{3639,3493,1495,2255,2},
{2423,4824,2705,80,2},{5079,2757,5297,2897,2},{1265,2193,2395,1409,2}}
}

hth,
Peter