From: Bill Rowe on 5 Feb 2010 03:34 On 2/4/10 at 6:26 AM, Clint.Zeringue(a)kirtland.af.mil (Zeringue, Clint M Civ USAF AFMC AFRL/RDLAF) wrote: >Suppose you have the following. >Data = RandomReal[1,{N,2}]; >sameQ[_,_]=False; sameQ[{x_,y_},{x_,z_}]=True; >Timing[DeleteDuplicates[data,sameQ]][[1]]; >If N is a large number this takes an ungodly amount of time? >Is there a more efficient way to delete the duplicate entries of >Data ? The slowness of DeleteDuplicates comes about when a custom compare function is used as the following demonstrates In[27]:= sameQ[_, _] = False; sameQ[x_, x_] = True; In[29]:= data = RandomInteger[100, 2000]; In[30]:= Timing[Length[a = DeleteDuplicates[data]]] Out[30]= {0.000025,101} In[31]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]] Out[31]= {0.215696,101} In[32]:= a == b Out[32]= True The above is simply to illustrate the issue, not to solve your specific problem. For your case, I can get the same result in much less time using GatherBy as illustrated below In[33]:= Clear[sameQ] In[34]:= sameQ[_, _] = False; sameQ[{x_, y_}, {x_, z_}] = True; In[36]:= data = RandomInteger[100, {2000, 2}]; In[37]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]] Out[37]= {0.246448,101} In[38]:= Timing[Length[c = First /@ GatherBy[data, First]]] Out[38]= {0.000957,101} In[39]:= c == b Out[39]= True
From: Tomas Garza on 5 Feb 2010 03:35 Use Tally or, even better, GatherBy, to obtain very substantial reduc= tions in time: In[1]:= data=RandomInteger[{1,99},{100000,2}]; In[2]:= sameQ[_,_]=False; sameQ[{x_,y_},{x_,z_}]=True; In[4]:= Timing[t0=DeleteDuplicates[data,sameQ];] Out[4]= {7.987,Null} In[5]:= Timing[t1=#[[1]]&/@Tally[data,#1[[1]]==#2[[1]]&];][[1]] Out[5]= 0.063 In[6]:= Timing[t2=#[[1]]&/@GatherBy[data,First];][[1]] Out[6]= 0.016 In[7]:= t0===t1===t2 Out[7]= True Tomas > Date: Thu, 4 Feb 2010 06:26:02 -0500 > From: Clint.Zeringue(a)kirtland.af.mil > Subject: DeleteDuplicates is too slow? > To: mathgroup(a)smc.vnet.net > > Hello, > > Suppose you have the following. > > Data = RandomReal[1,{N,2}]; > > sameQ[_,_]=False; > sameQ[{x_,y_},{x_,z_}]=True; > > Timing[DeleteDuplicates[data,sameQ]][[1]]; > > If N is a large number this takes an ungodly amount of time? > > Is there a more efficient way to delete the duplicate entries of Data ? > > ie. > > Data = {{1.,2.},{1.,3.},{2.,3.}}; > > Would become: > {{1.,2.},{ 2.,3.}}; > > > Thanks, > > > Clint Zeringue >
From: DrMajorBob on 6 Feb 2010 03:23 Notice that sameQ@## & is the same as (and less verbose than) sameQ Bobby On Fri, 05 Feb 2010 02:25:03 -0600, Bill Rowe <readnews(a)sbcglobal.net> wrote: > On 2/4/10 at 6:26 AM, Clint.Zeringue(a)kirtland.af.mil (Zeringue, Clint > M Civ USAF AFMC AFRL/RDLAF) wrote: > >> Suppose you have the following. > >> Data = RandomReal[1,{N,2}]; > >> sameQ[_,_]=False; sameQ[{x_,y_},{x_,z_}]=True; > >> Timing[DeleteDuplicates[data,sameQ]][[1]]; > >> If N is a large number this takes an ungodly amount of time? > >> Is there a more efficient way to delete the duplicate entries of >> Data ? > > The slowness of DeleteDuplicates comes about when a custom > compare function is used as the following demonstrates > > In[27]:= sameQ[_, _] = False; > sameQ[x_, x_] = True; > > In[29]:= data = RandomInteger[100, 2000]; > > In[30]:= Timing[Length[a = DeleteDuplicates[data]]] > > Out[30]= {0.000025,101} > > In[31]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]] > > Out[31]= {0.215696,101} > > In[32]:= a == b > > Out[32]= True > > The above is simply to illustrate the issue, not to solve your > specific problem. For your case, I can get the same result in > much less time using GatherBy as illustrated below > > In[33]:= Clear[sameQ] > > In[34]:= sameQ[_, _] = False; > sameQ[{x_, y_}, {x_, z_}] = True; > > In[36]:= data = RandomInteger[100, {2000, 2}]; > > In[37]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]] > > Out[37]= {0.246448,101} > > In[38]:= Timing[Length[c = First /@ GatherBy[data, First]]] > > Out[38]= {0.000957,101} > > In[39]:= c == b > > Out[39]= True > > -- DrMajorBob(a)yahoo.com
From: DrMajorBob on 7 Feb 2010 06:16
Sorry... the second version is less verbose. (Obviously.) Bobby On Sat, 06 Feb 2010 02:24:16 -0600, DrMajorBob <btreat1(a)austin.rr.com> wrote: > Notice that > > sameQ@## & > > is the same as (and less verbose than) > > sameQ > > Bobby > > On Fri, 05 Feb 2010 02:25:03 -0600, Bill Rowe <readnews(a)sbcglobal.net> > wrote: > >> On 2/4/10 at 6:26 AM, Clint.Zeringue(a)kirtland.af.mil (Zeringue, Clint >> M Civ USAF AFMC AFRL/RDLAF) wrote: >> >>> Suppose you have the following. >> >>> Data = RandomReal[1,{N,2}]; >> >>> sameQ[_,_]=False; sameQ[{x_,y_},{x_,z_}]=True; >> >>> Timing[DeleteDuplicates[data,sameQ]][[1]]; >> >>> If N is a large number this takes an ungodly amount of time? >> >>> Is there a more efficient way to delete the duplicate entries of >>> Data ? >> >> The slowness of DeleteDuplicates comes about when a custom >> compare function is used as the following demonstrates >> >> In[27]:= sameQ[_, _] = False; >> sameQ[x_, x_] = True; >> >> In[29]:= data = RandomInteger[100, 2000]; >> >> In[30]:= Timing[Length[a = DeleteDuplicates[data]]] >> >> Out[30]= {0.000025,101} >> >> In[31]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]] >> >> Out[31]= {0.215696,101} >> >> In[32]:= a == b >> >> Out[32]= True >> >> The above is simply to illustrate the issue, not to solve your >> specific problem. For your case, I can get the same result in >> much less time using GatherBy as illustrated below >> >> In[33]:= Clear[sameQ] >> >> In[34]:= sameQ[_, _] = False; >> sameQ[{x_, y_}, {x_, z_}] = True; >> >> In[36]:= data = RandomInteger[100, {2000, 2}]; >> >> In[37]:= Timing[Length[b = DeleteDuplicates[data, sameQ@## &]]] >> >> Out[37]= {0.246448,101} >> >> In[38]:= Timing[Length[c = First /@ GatherBy[data, First]]] >> >> Out[38]= {0.000957,101} >> >> In[39]:= c == b >> >> Out[39]= True >> >> > > -- DrMajorBob(a)yahoo.com |