From: Rohit M on
Hi All,
I am looking for the fastest method to compare two alphanumeric
strings according
to dictionary algorithm and return 1, 0, or -1 just like "string
compare" does.
By dictionary algorithm I mean comparing "ABC (52)" with "ABC (152)"
should return
-1 as 52 is lesser than 152.

What modifications can be done to improve perfomance (there are
millions of such
comparisons involved) if the format of the string is always alphabets
followed by
numerals and separated by a space, i.e. following kind of strings can
be present:

ABC
ABC (52)
DEF (152)
etc. ...

The best I came up with is as follows, any improvements are
appreciated.
thanks
Rohit

set timeVal [time {
set value1 "ABC (52)"
set value2 "ABC (452)"

set index1 [string first "(" $value1]
set index2 [string first "(" $value2]
if { -1 == $index1 && -1 == $index2 } {
set retVal [string compare $value1 $value2]
} else {
set splitList1 [split $value1]
set splitList2 [split $value2]

set retVal 0

set alphaVal1 [lindex $splitList1 0]
set alphaVal2 [lindex $splitList2 0]

# If alphaVal are different do a simple string compare
if { $alphaVal1 != $alphaVal2 } {
set retVal [string compare $value1 $value2]
}

# If numeralVals are different we need to compare them also
if { [lindex $splitList1 1] != [lindex $splitList2 1]} {
set numeralVal1 [regsub -all {[^0-9]} [lindex $splitList1 1]
""]
set numeralVal2 [regsub -all {[^0-9]} [lindex $splitList2 1]
""]
if { [expr {$numeralVal1} ] > [expr {$numeralVal2}] } {
set retVal 1
} elseif { [expr {$numeralVal1}] < [expr {$numeralVal2}] } {
set retVal -1
}
}
}
# puts "return is: $alphaVal1, $alphaVal2, $numeralVal1,
$numeralVal2, $retVal"
} 100000]
puts "time: $timeVal"
From: Donald G Porter on
Rohit M wrote:
> if { [expr {$numeralVal1} ] > [expr {$numeralVal2}] } {

Please find whoever taught you that and tell them never to teach
Tcl again.

DGP
From: Uwe Klein on
Rohit M wrote:
> Hi All,
> I am looking for the fastest method to compare two alphanumeric
> strings according
> to dictionary algorithm and return 1, 0, or -1 just like "string
> compare" does.
> By dictionary algorithm I mean comparing "ABC (52)" with "ABC (152)"
> should return
> -1 as 52 is lesser than 152.
>
> What modifications can be done to improve perfomance (there are
> millions of such
> comparisons involved) if the format of the string is always alphabets
> followed by
> numerals and separated by a space, i.e. following kind of strings can
> be present:
>
> ABC
> ABC (52)
> DEF (152)
> etc. ...
>
> The best I came up with is as follows, any improvements are
> appreciated.
> thanks
> Rohit
>
> set timeVal [time {
> set value1 "ABC (52)"
> set value2 "ABC (452)"
>
> set index1 [string first "(" $value1]
> set index2 [string first "(" $value2]
> if { -1 == $index1 && -1 == $index2 } {
> set retVal [string compare $value1 $value2]
> } else {
> set splitList1 [split $value1]
> set splitList2 [split $value2]
>
> set retVal 0
>
> set alphaVal1 [lindex $splitList1 0]
> set alphaVal2 [lindex $splitList2 0]
>
> # If alphaVal are different do a simple string compare
> if { $alphaVal1 != $alphaVal2 } {
> set retVal [string compare $value1 $value2]
> }
>
> # If numeralVals are different we need to compare them also
> if { [lindex $splitList1 1] != [lindex $splitList2 1]} {
> set numeralVal1 [regsub -all {[^0-9]} [lindex $splitList1 1]
> ""]
> set numeralVal2 [regsub -all {[^0-9]} [lindex $splitList2 1]
> ""]
> if { [expr {$numeralVal1} ] > [expr {$numeralVal2}] } {
> set retVal 1
> } elseif { [expr {$numeralVal1}] < [expr {$numeralVal2}] } {
> set retVal -1
> }
> }
> }
> # puts "return is: $alphaVal1, $alphaVal2, $numeralVal1,
> $numeralVal2, $retVal"
> } 100000]
> puts "time: $timeVal"

I am too dumb to compare strings.
But I can sort them:

set vals {
"ABC"
"ABC (334)"
"ABC (34)"
"ABC (4)"
"ABD (335)"
"ABD (35)"
"ABD (5)"
"ABCBD (335)"
"ABCBD (35)"
"ABCBD (5)"
}

# prepare
foreach val $vals {
set lst [ split $val () ]
lappend lst 0 0 0 0 0 0 0 0
set lst [ lrange $lst 0 4]
lappend list $lst
set ::arr($lst) $val
}
# sort
set sorted $list
set sorted [ lsort -index 1 -integer -increasing $sorted ]
set sorted [ lsort -index 0 $sorted ]

# retrieve the original string
foreach item $sorted {
lappend final $::arr($item)
}
# print
puts [ join $final \n ]
# end
anyway the way to go imho is to format strings
into a sortable ( as string or list ) presentation
and keep a reference to the original.

uwe



uwe
From: Donald G Porter on
Uwe Klein wrote:
> I am too dumb to compare strings.
> But I can sort them:

Why not use [lsort -dictionary] if the aim is sorting?

The [lsort -dictionary] command could also be the core of
a pairwise comparison function, but I'll leave it to other
to test whether the performance of such a thing is acceptable
and/or better than other alternatives.

DGP
From: Georgios Petasis on
στις 20/4/2010 19:20, O/H Donald G Porter έγραψε:
> Uwe Klein wrote:
>> I am too dumb to compare strings.
>> But I can sort them:
>
> Why not use [lsort -dictionary] if the aim is sorting?
>
> The [lsort -dictionary] command could also be the core of
> a pairwise comparison function, but I'll leave it to other
> to test whether the performance of such a thing is acceptable
> and/or better than other alternatives.
>
> DGP

In my opinion this shows that string compare is missing an option. Isn't it?

George