Prev: Using a TEA package
Next: Strange embedded proc error
From: Georgios Petasis on 21 Sep 2009 16:28 Hi all, I am using a dict to keep the occurrence frequencies of some n-grams. But how can I get the n most frequent n-grams? How can I sort a dict based on values? (and not keys?) George
From: Alexandre Ferrieux on 21 Sep 2009 16:50 On Sep 21, 10:28 pm, Georgios Petasis <peta...(a)iit.demokritos.gr> wrote: > > I am using a dict to keep the occurrence frequencies of some n-grams. > But how can I get the n most frequent n-grams? The simplest would be to sort the list of frequencies, which is O (MlogM) where M is the number of different frequencies. If you want the N largest and N<<M, it will be more efficient to keep and update a sorted list of the N best (using the new lsearch -bisect), because you'll get O(MlogN). And of course, once you hve the N best frequencies, you crawl back to the n-grams by the reverse dict/array, built with [lappend] (example provided with a reverse array): dict for {k v} $d {lappend rev($v) $k} > How can I sort a dict based on values? (and not keys?) Just project it onto a list. You'll get it for free, with duplicates removed, by using the key list of the above reverse array: set sorted_freqs [lsort -integer -decreasing [array get rev]] -Alex
From: Georgios Petasis on 21 Sep 2009 16:51 O/H Georgios Petasis ������: > Hi all, > > I am using a dict to keep the occurrence frequencies of some n-grams. > But how can I get the n most frequent n-grams? > > How can I sort a dict based on values? (and not keys?) > > George lsort -stride will be useful, but not available in 8.5. Any alternatives into using an intermediate array (with frequencies as keys, and words as values)? (I have implemented an alternative, I am looking for something faster :D ) George
From: Georgios Petasis on 21 Sep 2009 18:28 O/H Alexandre Ferrieux έγραψε: > On Sep 21, 10:28 pm, Georgios Petasis <peta...(a)iit.demokritos.gr> > wrote: >> I am using a dict to keep the occurrence frequencies of some n-grams. >> But how can I get the n most frequent n-grams? > > The simplest would be to sort the list of frequencies, which is O > (MlogM) where M is the number of different frequencies. If you want > the N largest and N<<M, it will be more efficient to keep and update a > sorted list of the N best (using the new lsearch -bisect), because > you'll get O(MlogN). > > And of course, once you hve the N best frequencies, you crawl back to > the n-grams by the reverse dict/array, built with [lappend] (example > provided with a reverse array): > > dict for {k v} $d {lappend rev($v) $k} > >> How can I sort a dict based on values? (and not keys?) > > Just project it onto a list. You'll get it for free, with duplicates > removed, by using the key list of the above reverse array: > > set sorted_freqs [lsort -integer -decreasing [array get rev]] > > -Alex Dear Alex, Thank you very much. This is what I also did. But it would be much more simple if dict had an lsort command of its own, with the ability to sort with keys or values :-) George ## Sort the n-grams... dict for {key val} $ngram { lappend NGRAM($val) $key } set ngram [dict create] set count 0 foreach freq [lsort -integer -decreasing [array names NGRAM]] { foreach one [lsort -dictionary $NGRAM($freq)] { dict set ngram $one $freq if {[incr count] >= $topmost_ngrams} {return $ngram} } } return $ngram
From: Alexandre Ferrieux on 21 Sep 2009 18:39
On Sep 22, 12:28 am, Georgios Petasis <peta...(a)iit.demokritos.gr> wrote: > > But it would be much more simple if dict had an lsort command of its > own, with the ability to sort with keys or values :-) Sorting on keys is what you get with [-stride 2], though in the process you lose dict-ness (you shimmer to a list -- which may be OK in most circumstances). Sorting on values implies reverting the dict; so maybe a better primitive would be [dict revert], but I'm not sure how much a C implementation would gain over the Tcl version... -Alex |