From: Martin Gregorie on
On Mon, 03 May 2010 12:27:08 +0100, chris wrote:

I'm using Fedora 12 which has sort 7.6. LC_ALL is unset

This version of sort sorted your data on column 1 only by default.

However, there's something odd about it when the --key option is used: I
can't get it to sort on your second (+/-) column and, if any of the
sorting types (-g -i -n) are used the --key option seems to be ignored.

In summary, I have a few problems with this version sort:

- the manpage is so terse as to be meaningless and says nothing
about apparently valid combinations of sort parameters,
e.g. what if I want to do a caseless sort on column 3

- there are severe limitations on the keys that can be selected,
such as a total inability to specify sorts like:
- first key: a caseless alpha sort on column 1
- second key: a descending numeric sort on column 2

I think in summary, that its fair to say that its OK if you want its
defaults, but for anything else use awk, Perl to read in and sort an
array or write a better, more configurable replacement yourself.


--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
From: unruh on
On 2010-05-03, chris <ithinkiam(a)gmail.com> wrote:
> Hi all,
>
> Given this data:
>
> 0 + AGAAATCTAACACAAAATCATTAACTTAT-TAGTTTCCAA
> 0 + AAGAGAAACGATATTAGTCCAAAAATGTAAACATA
> 0 - TCGTTGGTAACAATATCTAC-TTT-CT
> 3 - TTTTTGTCTTTTTTTTTTTTTTTGTTTAGTTA-GT
> 0 - GACGATAAAGAAATAAAATCT-ATT-GCTTCTT-GT
> 1 - TTTTTTTTTTTAAAAATA-ATTTC-TTAATATCTT
> 1 - CTATATAGTTTGTGGACATTATATTATGTTCTCTCTTGACTAA-ATGT
> 0 - TTTGTCCAAGTCAACTAAGTGCACTA-AAAAGGATCTTCTAT
> 2 + ATTATTGGCTTATTATTGCCAAAACAGAAAA-AAA
> 0 - C-TACGTGTCTGATGCAATAATGGAAATGGAGTTGTGTGT
> 0 - TGTTGTATGACATCATAATTATGGAATTTTTTTT-GTT
> 0 + AATAATAAGAAAA-AAAA-AAAAA-AAAAAAA
> 0 - TGTTGAAAAGCATCTAACTTGA--AGGACGGTCTGAGGCTT
> 0 - ATTTTTTTGTTTTTTTATCA-C--AAATTA-T-AT
> 1 + ACTATCGGAAAAAATCAAGACGCACGGATATATAAA
> 2 + GACATCAAAGATACTTT-CTTGAACAAGACCAGGAATA
> 0 + AAACAAACCAGAAACTTTCATATCAATAATACATAGAA
> 0 - TTCTATGTGATATTTTGGTTCGCTGTGTG
> 0 - TTTTTTTTTTTTT-TTTTTTTCTTTTTACT--T
> 0 + GTCGACCATAAAAGTTTACATAAAGAATCAAGGTT
>
>
> sort v5.97 (as per Centos5.4) gives this:
>> $ sort -k2 file
> 0 + AAACAAACCAGAAACTTTCATATCAATAATACATAGAA
> 0 + AAGAGAAACGATATTAGTCCAAAAATGTAAACATA
> 0 + AATAATAAGAAAA-AAAA-AAAAA-AAAAAAA
> 1 + ACTATCGGAAAAAATCAAGACGCACGGATATATAAA
> 0 + AGAAATCTAACACAAAATCATTAACTTAT-TAGTTTCCAA
> 2 + ATTATTGGCTTATTATTGCCAAAACAGAAAA-AAA
> 0 - ATTTTTTTGTTTTTTTATCA-C--AAATTA-T-AT
> 0 - C-TACGTGTCTGATGCAATAATGGAAATGGAGTTGTGTGT
> 1 - CTATATAGTTTGTGGACATTATATTATGTTCTCTCTTGACTAA-ATGT
> 2 + GACATCAAAGATACTTT-CTTGAACAAGACCAGGAATA
> 0 - GACGATAAAGAAATAAAATCT-ATT-GCTTCTT-GT
> 0 + GTCGACCATAAAAGTTTACATAAAGAATCAAGGTT
> 0 - TCGTTGGTAACAATATCTAC-TTT-CT
> 0 - TGTTGAAAAGCATCTAACTTGA--AGGACGGTCTGAGGCTT
> 0 - TGTTGTATGACATCATAATTATGGAATTTTTTTT-GTT
> 0 - TTCTATGTGATATTTTGGTTCGCTGTGTG
> 0 - TTTGTCCAAGTCAACTAAGTGCACTA-AAAAGGATCTTCTAT
> 3 - TTTTTGTCTTTTTTTTTTTTTTTGTTTAGTTA-GT
> 1 - TTTTTTTTTTTAAAAATA-ATTTC-TTAATATCTT
> 0 - TTTTTTTTTTTTT-TTTTTTTCTTTTTACT--T
>
> i.e. it's sorting on column 3 not 2.
>
> sort v5.93 (as per Mac OS 10.5.8) gives:
> 0 + AAACAAACCAGAAACTTTCATATCAATAATACATAGAA
> 0 + AAGAGAAACGATATTAGTCCAAAAATGTAAACATA
> 0 + AATAATAAGAAAA-AAAA-AAAAA-AAAAAAA
> 1 + ACTATCGGAAAAAATCAAGACGCACGGATATATAAA
> 0 + AGAAATCTAACACAAAATCATTAACTTAT-TAGTTTCCAA
> 2 + ATTATTGGCTTATTATTGCCAAAACAGAAAA-AAA
> 2 + GACATCAAAGATACTTT-CTTGAACAAGACCAGGAATA
> 0 + GTCGACCATAAAAGTTTACATAAAGAATCAAGGTT
> 0 - ATTTTTTTGTTTTTTTATCA-C--AAATTA-T-AT
> 0 - C-TACGTGTCTGATGCAATAATGGAAATGGAGTTGTGTGT
> 1 - CTATATAGTTTGTGGACATTATATTATGTTCTCTCTTGACTAA-ATGT
> 0 - GACGATAAAGAAATAAAATCT-ATT-GCTTCTT-GT
> 0 - TCGTTGGTAACAATATCTAC-TTT-CT
> 0 - TGTTGAAAAGCATCTAACTTGA--AGGACGGTCTGAGGCTT
> 0 - TGTTGTATGACATCATAATTATGGAATTTTTTTT-GTT
> 0 - TTCTATGTGATATTTTGGTTCGCTGTGTG
> 0 - TTTGTCCAAGTCAACTAAGTGCACTA-AAAAGGATCTTCTAT
> 3 - TTTTTGTCTTTTTTTTTTTTTTTGTTTAGTTA-GT
> 1 - TTTTTTTTTTTAAAAATA-ATTTC-TTAATATCTT
> 0 - TTTTTTTTTTTTT-TTTTTTTCTTTTTACT--T
>
> Which looks like it's sorting column 2 then 3. Anyone else seen this and
> is it a bug?

man sort.
Sort uses C numbering convention which starts with 0, not with 1. Ie the
first column is column 0, not column 1

Mac has obviously decided that this is confusing and has altered the
normal gnu/unix sort function to use Fortran numbering convention.

From: Bruce Stephens on
unruh <unruh(a)wormhole.physics.ubc.ca> writes:

[...]

> man sort.
> Sort uses C numbering convention which starts with 0, not with 1. Ie the
> first column is column 0, not column 1

-k, --key=POS1[,POS2]
start a key at POS1 (origin 1), end it at POS2 (default end of line)

From: Dave Gibson on
chris <ithinkiam(a)gmail.com> wrote:
> On Mon, 03 May 2010 13:32:09 +0100, chris <ithinkiam(a)gmail.com> wrote:
>
>> On Mon, 03 May 2010 13:25:22 +0100, Gordon Henderson
>> <gordon+usenet(a)drogon.net> wrote:

>>> It's not a bug, but a feature.
>>
>> LOCALES strikes again! Thanks Gordon.
>
> Wait a sec! This issue initially cropped up with a multi-column sort and I
> thought I'd whittled it down to a 'simple' example. However, the original
> problem is still not solved.
>
> Given this file:
> 2 20140192 +
> 0 25394313 +
> 0 17128576 -
> 1 19332581 -
> 2 5214084 -
[...]
>> $ sort -k1n -k3 -nk2 file
[...]
> It's sorting cols 1 and 2, but not 3. What's wrong here?

If you only specify a beginning position for the key sort will use
from that position to the end of the line. Specify end points for
the keys:

LC_COLLATE=POSIX sort -k1n,1n -k3,3 -k2n,2n your_file
From: chris on
On Mon, 03 May 2010 20:29:16 +0100, Dave Gibson
<dave.gma+news002(a)googlemail.com.invalid> wrote:

> chris <ithinkiam(a)gmail.com> wrote:
>> On Mon, 03 May 2010 13:32:09 +0100, chris <ithinkiam(a)gmail.com> wrote:
>>
>>> On Mon, 03 May 2010 13:25:22 +0100, Gordon Henderson
>>> <gordon+usenet(a)drogon.net> wrote:
>
>>>> It's not a bug, but a feature.
>>>
>>> LOCALES strikes again! Thanks Gordon.
>>
>> Wait a sec! This issue initially cropped up with a multi-column sort
>> and I
>> thought I'd whittled it down to a 'simple' example. However, the
>> original
>> problem is still not solved.
>>
>> Given this file:
>> 2 20140192 +
>> 0 25394313 +
>> 0 17128576 -
>> 1 19332581 -
>> 2 5214084 -
> [...]
>>> $ sort -k1n -k3 -nk2 file
> [...]
>> It's sorting cols 1 and 2, but not 3. What's wrong here?
>
> If you only specify a beginning position for the key sort will use
> from that position to the end of the line. Specify end points for
> the keys:
>
> LC_COLLATE=POSIX sort -k1n,1n -k3,3 -k2n,2n your_file

Ah, I don't normally do such complex sorts and often do them 'in order' so
I haven't come across the 'end of line' issue.
Thanks.