From: Martin Gregorie on 3 May 2010 10:26 On Mon, 03 May 2010 12:27:08 +0100, chris wrote: I'm using Fedora 12 which has sort 7.6. LC_ALL is unset This version of sort sorted your data on column 1 only by default. However, there's something odd about it when the --key option is used: I can't get it to sort on your second (+/-) column and, if any of the sorting types (-g -i -n) are used the --key option seems to be ignored. In summary, I have a few problems with this version sort: - the manpage is so terse as to be meaningless and says nothing about apparently valid combinations of sort parameters, e.g. what if I want to do a caseless sort on column 3 - there are severe limitations on the keys that can be selected, such as a total inability to specify sorts like: - first key: a caseless alpha sort on column 1 - second key: a descending numeric sort on column 2 I think in summary, that its fair to say that its OK if you want its defaults, but for anything else use awk, Perl to read in and sort an array or write a better, more configurable replacement yourself. -- martin@ | Martin Gregorie gregorie. | Essex, UK org |
From: unruh on 3 May 2010 14:40 On 2010-05-03, chris <ithinkiam(a)gmail.com> wrote: > Hi all, > > Given this data: > > 0 + AGAAATCTAACACAAAATCATTAACTTAT-TAGTTTCCAA > 0 + AAGAGAAACGATATTAGTCCAAAAATGTAAACATA > 0 - TCGTTGGTAACAATATCTAC-TTT-CT > 3 - TTTTTGTCTTTTTTTTTTTTTTTGTTTAGTTA-GT > 0 - GACGATAAAGAAATAAAATCT-ATT-GCTTCTT-GT > 1 - TTTTTTTTTTTAAAAATA-ATTTC-TTAATATCTT > 1 - CTATATAGTTTGTGGACATTATATTATGTTCTCTCTTGACTAA-ATGT > 0 - TTTGTCCAAGTCAACTAAGTGCACTA-AAAAGGATCTTCTAT > 2 + ATTATTGGCTTATTATTGCCAAAACAGAAAA-AAA > 0 - C-TACGTGTCTGATGCAATAATGGAAATGGAGTTGTGTGT > 0 - TGTTGTATGACATCATAATTATGGAATTTTTTTT-GTT > 0 + AATAATAAGAAAA-AAAA-AAAAA-AAAAAAA > 0 - TGTTGAAAAGCATCTAACTTGA--AGGACGGTCTGAGGCTT > 0 - ATTTTTTTGTTTTTTTATCA-C--AAATTA-T-AT > 1 + ACTATCGGAAAAAATCAAGACGCACGGATATATAAA > 2 + GACATCAAAGATACTTT-CTTGAACAAGACCAGGAATA > 0 + AAACAAACCAGAAACTTTCATATCAATAATACATAGAA > 0 - TTCTATGTGATATTTTGGTTCGCTGTGTG > 0 - TTTTTTTTTTTTT-TTTTTTTCTTTTTACT--T > 0 + GTCGACCATAAAAGTTTACATAAAGAATCAAGGTT > > > sort v5.97 (as per Centos5.4) gives this: >> $ sort -k2 file > 0 + AAACAAACCAGAAACTTTCATATCAATAATACATAGAA > 0 + AAGAGAAACGATATTAGTCCAAAAATGTAAACATA > 0 + AATAATAAGAAAA-AAAA-AAAAA-AAAAAAA > 1 + ACTATCGGAAAAAATCAAGACGCACGGATATATAAA > 0 + AGAAATCTAACACAAAATCATTAACTTAT-TAGTTTCCAA > 2 + ATTATTGGCTTATTATTGCCAAAACAGAAAA-AAA > 0 - ATTTTTTTGTTTTTTTATCA-C--AAATTA-T-AT > 0 - C-TACGTGTCTGATGCAATAATGGAAATGGAGTTGTGTGT > 1 - CTATATAGTTTGTGGACATTATATTATGTTCTCTCTTGACTAA-ATGT > 2 + GACATCAAAGATACTTT-CTTGAACAAGACCAGGAATA > 0 - GACGATAAAGAAATAAAATCT-ATT-GCTTCTT-GT > 0 + GTCGACCATAAAAGTTTACATAAAGAATCAAGGTT > 0 - TCGTTGGTAACAATATCTAC-TTT-CT > 0 - TGTTGAAAAGCATCTAACTTGA--AGGACGGTCTGAGGCTT > 0 - TGTTGTATGACATCATAATTATGGAATTTTTTTT-GTT > 0 - TTCTATGTGATATTTTGGTTCGCTGTGTG > 0 - TTTGTCCAAGTCAACTAAGTGCACTA-AAAAGGATCTTCTAT > 3 - TTTTTGTCTTTTTTTTTTTTTTTGTTTAGTTA-GT > 1 - TTTTTTTTTTTAAAAATA-ATTTC-TTAATATCTT > 0 - TTTTTTTTTTTTT-TTTTTTTCTTTTTACT--T > > i.e. it's sorting on column 3 not 2. > > sort v5.93 (as per Mac OS 10.5.8) gives: > 0 + AAACAAACCAGAAACTTTCATATCAATAATACATAGAA > 0 + AAGAGAAACGATATTAGTCCAAAAATGTAAACATA > 0 + AATAATAAGAAAA-AAAA-AAAAA-AAAAAAA > 1 + ACTATCGGAAAAAATCAAGACGCACGGATATATAAA > 0 + AGAAATCTAACACAAAATCATTAACTTAT-TAGTTTCCAA > 2 + ATTATTGGCTTATTATTGCCAAAACAGAAAA-AAA > 2 + GACATCAAAGATACTTT-CTTGAACAAGACCAGGAATA > 0 + GTCGACCATAAAAGTTTACATAAAGAATCAAGGTT > 0 - ATTTTTTTGTTTTTTTATCA-C--AAATTA-T-AT > 0 - C-TACGTGTCTGATGCAATAATGGAAATGGAGTTGTGTGT > 1 - CTATATAGTTTGTGGACATTATATTATGTTCTCTCTTGACTAA-ATGT > 0 - GACGATAAAGAAATAAAATCT-ATT-GCTTCTT-GT > 0 - TCGTTGGTAACAATATCTAC-TTT-CT > 0 - TGTTGAAAAGCATCTAACTTGA--AGGACGGTCTGAGGCTT > 0 - TGTTGTATGACATCATAATTATGGAATTTTTTTT-GTT > 0 - TTCTATGTGATATTTTGGTTCGCTGTGTG > 0 - TTTGTCCAAGTCAACTAAGTGCACTA-AAAAGGATCTTCTAT > 3 - TTTTTGTCTTTTTTTTTTTTTTTGTTTAGTTA-GT > 1 - TTTTTTTTTTTAAAAATA-ATTTC-TTAATATCTT > 0 - TTTTTTTTTTTTT-TTTTTTTCTTTTTACT--T > > Which looks like it's sorting column 2 then 3. Anyone else seen this and > is it a bug? man sort. Sort uses C numbering convention which starts with 0, not with 1. Ie the first column is column 0, not column 1 Mac has obviously decided that this is confusing and has altered the normal gnu/unix sort function to use Fortran numbering convention.
From: Bruce Stephens on 3 May 2010 14:46 unruh <unruh(a)wormhole.physics.ubc.ca> writes: [...] > man sort. > Sort uses C numbering convention which starts with 0, not with 1. Ie the > first column is column 0, not column 1 -k, --key=POS1[,POS2] start a key at POS1 (origin 1), end it at POS2 (default end of line)
From: Dave Gibson on 3 May 2010 15:29 chris <ithinkiam(a)gmail.com> wrote: > On Mon, 03 May 2010 13:32:09 +0100, chris <ithinkiam(a)gmail.com> wrote: > >> On Mon, 03 May 2010 13:25:22 +0100, Gordon Henderson >> <gordon+usenet(a)drogon.net> wrote: >>> It's not a bug, but a feature. >> >> LOCALES strikes again! Thanks Gordon. > > Wait a sec! This issue initially cropped up with a multi-column sort and I > thought I'd whittled it down to a 'simple' example. However, the original > problem is still not solved. > > Given this file: > 2 20140192 + > 0 25394313 + > 0 17128576 - > 1 19332581 - > 2 5214084 - [...] >> $ sort -k1n -k3 -nk2 file [...] > It's sorting cols 1 and 2, but not 3. What's wrong here? If you only specify a beginning position for the key sort will use from that position to the end of the line. Specify end points for the keys: LC_COLLATE=POSIX sort -k1n,1n -k3,3 -k2n,2n your_file
From: chris on 4 May 2010 03:59 On Mon, 03 May 2010 20:29:16 +0100, Dave Gibson <dave.gma+news002(a)googlemail.com.invalid> wrote: > chris <ithinkiam(a)gmail.com> wrote: >> On Mon, 03 May 2010 13:32:09 +0100, chris <ithinkiam(a)gmail.com> wrote: >> >>> On Mon, 03 May 2010 13:25:22 +0100, Gordon Henderson >>> <gordon+usenet(a)drogon.net> wrote: > >>>> It's not a bug, but a feature. >>> >>> LOCALES strikes again! Thanks Gordon. >> >> Wait a sec! This issue initially cropped up with a multi-column sort >> and I >> thought I'd whittled it down to a 'simple' example. However, the >> original >> problem is still not solved. >> >> Given this file: >> 2 20140192 + >> 0 25394313 + >> 0 17128576 - >> 1 19332581 - >> 2 5214084 - > [...] >>> $ sort -k1n -k3 -nk2 file > [...] >> It's sorting cols 1 and 2, but not 3. What's wrong here? > > If you only specify a beginning position for the key sort will use > from that position to the end of the line. Specify end points for > the keys: > > LC_COLLATE=POSIX sort -k1n,1n -k3,3 -k2n,2n your_file Ah, I don't normally do such complex sorts and often do them 'in order' so I haven't come across the 'end of line' issue. Thanks.
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 Prev: Serial to Ethernet adapters Next: Bogus "Unable to mount" error dialog |