From: Bill Rowe on
On 2/4/10 at 6:26 AM, Clint.Zeringue(a)kirtland.af.mil (Zeringue, Clint
M Civ USAF AFMC AFRL/RDLAF) wrote:

>Suppose you have the following.

>Data = RandomReal[1,{N,2}];

>sameQ[_,_]=False; sameQ[{x_,y_},{x_,z_}]=True;

>Timing[DeleteDuplicates[data,sameQ]][[1]];

>If N is a large number this takes an ungodly amount of time?

>Is there a more efficient way to delete the duplicate entries of
>Data ?

The slowness of DeleteDuplicates comes about when a custom
compare function is used as the following demonstrates

In[27]:= sameQ[_, _] = False;
sameQ[x_, x_] = True;

In[29]:= data = RandomInteger[100, 2000];

In[30]:= Timing[Length[a = DeleteDuplicates[data]]]

Out[30]= {0.000025,101}

In[31]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]]

Out[31]= {0.215696,101}

In[32]:= a == b

Out[32]= True

The above is simply to illustrate the issue, not to solve your
specific problem. For your case, I can get the same result in
much less time using GatherBy as illustrated below

In[33]:= Clear[sameQ]

In[34]:= sameQ[_, _] = False;
sameQ[{x_, y_}, {x_, z_}] = True;

In[36]:= data = RandomInteger[100, {2000, 2}];

In[37]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]]

Out[37]= {0.246448,101}

In[38]:= Timing[Length[c = First /@ GatherBy[data, First]]]

Out[38]= {0.000957,101}

In[39]:= c == b

Out[39]= True


From: Tomas Garza on
Use Tally or, even better, GatherBy, to obtain very substantial reduc=
tions in time:

In[1]:= data=RandomInteger[{1,99},{100000,2}];

In[2]:=
sameQ[_,_]=False;
sameQ[{x_,y_},{x_,z_}]=True;

In[4]:= Timing[t0=DeleteDuplicates[data,sameQ];]
Out[4]= {7.987,Null}

In[5]:= Timing[t1=#[[1]]&/@Tally[data,#1[[1]]==#2[[1]]&];][[1]]
Out[5]= 0.063

In[6]:= Timing[t2=#[[1]]&/@GatherBy[data,First];][[1]]
Out[6]= 0.016

In[7]:= t0===t1===t2
Out[7]= True

Tomas


> Date: Thu, 4 Feb 2010 06:26:02 -0500
> From: Clint.Zeringue(a)kirtland.af.mil
> Subject: DeleteDuplicates is too slow?
> To: mathgroup(a)smc.vnet.net
>
> Hello,
>
> Suppose you have the following.
>
> Data = RandomReal[1,{N,2}];
>
> sameQ[_,_]=False;
> sameQ[{x_,y_},{x_,z_}]=True;
>
> Timing[DeleteDuplicates[data,sameQ]][[1]];
>
> If N is a large number this takes an ungodly amount of time?
>
> Is there a more efficient way to delete the duplicate entries of Data ?
>
> ie.
>
> Data = {{1.,2.},{1.,3.},{2.,3.}};
>
> Would become:
> {{1.,2.},{ 2.,3.}};
>
>
> Thanks,
>
>
> Clint Zeringue
>

From: DrMajorBob on
Notice that

sameQ@## &

is the same as (and less verbose than)

sameQ

Bobby

On Fri, 05 Feb 2010 02:25:03 -0600, Bill Rowe <readnews(a)sbcglobal.net>
wrote:

> On 2/4/10 at 6:26 AM, Clint.Zeringue(a)kirtland.af.mil (Zeringue, Clint
> M Civ USAF AFMC AFRL/RDLAF) wrote:
>
>> Suppose you have the following.
>
>> Data = RandomReal[1,{N,2}];
>
>> sameQ[_,_]=False; sameQ[{x_,y_},{x_,z_}]=True;
>
>> Timing[DeleteDuplicates[data,sameQ]][[1]];
>
>> If N is a large number this takes an ungodly amount of time?
>
>> Is there a more efficient way to delete the duplicate entries of
>> Data ?
>
> The slowness of DeleteDuplicates comes about when a custom
> compare function is used as the following demonstrates
>
> In[27]:= sameQ[_, _] = False;
> sameQ[x_, x_] = True;
>
> In[29]:= data = RandomInteger[100, 2000];
>
> In[30]:= Timing[Length[a = DeleteDuplicates[data]]]
>
> Out[30]= {0.000025,101}
>
> In[31]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]]
>
> Out[31]= {0.215696,101}
>
> In[32]:= a == b
>
> Out[32]= True
>
> The above is simply to illustrate the issue, not to solve your
> specific problem. For your case, I can get the same result in
> much less time using GatherBy as illustrated below
>
> In[33]:= Clear[sameQ]
>
> In[34]:= sameQ[_, _] = False;
> sameQ[{x_, y_}, {x_, z_}] = True;
>
> In[36]:= data = RandomInteger[100, {2000, 2}];
>
> In[37]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]]
>
> Out[37]= {0.246448,101}
>
> In[38]:= Timing[Length[c = First /@ GatherBy[data, First]]]
>
> Out[38]= {0.000957,101}
>
> In[39]:= c == b
>
> Out[39]= True
>
>


--
DrMajorBob(a)yahoo.com

From: DrMajorBob on
Sorry... the second version is less verbose. (Obviously.)

Bobby

On Sat, 06 Feb 2010 02:24:16 -0600, DrMajorBob <btreat1(a)austin.rr.com>
wrote:

> Notice that
>
> sameQ@## &
>
> is the same as (and less verbose than)
>
> sameQ
>
> Bobby
>
> On Fri, 05 Feb 2010 02:25:03 -0600, Bill Rowe <readnews(a)sbcglobal.net>
> wrote:
>
>> On 2/4/10 at 6:26 AM, Clint.Zeringue(a)kirtland.af.mil (Zeringue, Clint
>> M Civ USAF AFMC AFRL/RDLAF) wrote:
>>
>>> Suppose you have the following.
>>
>>> Data = RandomReal[1,{N,2}];
>>
>>> sameQ[_,_]=False; sameQ[{x_,y_},{x_,z_}]=True;
>>
>>> Timing[DeleteDuplicates[data,sameQ]][[1]];
>>
>>> If N is a large number this takes an ungodly amount of time?
>>
>>> Is there a more efficient way to delete the duplicate entries of
>>> Data ?
>>
>> The slowness of DeleteDuplicates comes about when a custom
>> compare function is used as the following demonstrates
>>
>> In[27]:= sameQ[_, _] = False;
>> sameQ[x_, x_] = True;
>>
>> In[29]:= data = RandomInteger[100, 2000];
>>
>> In[30]:= Timing[Length[a = DeleteDuplicates[data]]]
>>
>> Out[30]= {0.000025,101}
>>
>> In[31]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]]
>>
>> Out[31]= {0.215696,101}
>>
>> In[32]:= a == b
>>
>> Out[32]= True
>>
>> The above is simply to illustrate the issue, not to solve your
>> specific problem. For your case, I can get the same result in
>> much less time using GatherBy as illustrated below
>>
>> In[33]:= Clear[sameQ]
>>
>> In[34]:= sameQ[_, _] = False;
>> sameQ[{x_, y_}, {x_, z_}] = True;
>>
>> In[36]:= data = RandomInteger[100, {2000, 2}];
>>
>> In[37]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]]
>>
>> Out[37]= {0.246448,101}
>>
>> In[38]:= Timing[Length[c = First /@ GatherBy[data, First]]]
>>
>> Out[38]= {0.000957,101}
>>
>> In[39]:= c == b
>>
>> Out[39]= True
>>
>>
>
>


--
DrMajorBob(a)yahoo.com