Question regarding binary strings [TCL]

Prev: can androgel cause estrogen levels to increase
Next: Tkhtml2.0 source code for 64bit

From: Paul on 10 Jan 2010 10:09

Hello,

I have a question regarding Tcl binary strings.

If running the following script creating two binary strings:

set num 50

for { set j 0 } { $j < $num } { incr j } {
append row1 [binary format c 0]
}
puts "row1: length=[string length $row1] bytelength=[string bytelength
$row1]"

for { set j 0 } { $j < $num } { incr j } {
append row2 [binary format c 1]
}
puts "row2: length=[string length $row2] bytelength=[string bytelength
$row2]"

I get the following output (tested with 8.4, 8.5, 8.6):

row1: length=50 bytelength=100
row2: length=50 bytelength=50

Why do zero values occupy 2 bytes in a binary string?

Regards,

Paul

From: slebetman on 10 Jan 2010 10:31

On Jan 10, 11:09 pm, "Paul(a)Tcl3D" <p...(a)tcl3d.org> wrote:
> Hello,
>
> I have a question regarding Tcl binary strings.
>
> If running the following script creating two binary strings:
>
> set num 50
>
> for { set j 0 } { $j < $num } { incr j } {
> append row1 [binary format c 0]}
>
> puts "row1: length=[string length $row1] bytelength=[string bytelength
> $row1]"
>
> for { set j 0 } { $j < $num } { incr j } {
> append row2 [binary format c 1]}
>
> puts "row2: length=[string length $row2] bytelength=[string bytelength
> $row2]"
>
> I get the following output (tested with 8.4, 8.5, 8.6):
>
> row1: length=50 bytelength=100
> row2: length=50 bytelength=50
>
> Why do zero values occupy 2 bytes in a binary string?
>

Don't use [string bytelength]. What you want is [string length].

Because Tcl is implemented in C and because in C, strings are
terminated by nul (0x00), the tcl interpreter internally encodes nuls
as a special two-byte character. The [string bytelength] is really
there mainly for debugging purposes or to workaround any possible edge
cases not automatically handled by tcl. For everything else use
[string length].

From: Alexandre Ferrieux on 10 Jan 2010 11:58

On Jan 10, 4:09 pm, "Paul(a)Tcl3D" <p...(a)tcl3d.org> wrote:
> Hello,
>
> I have a question regarding Tcl binary strings.
>
> If running the following script creating two binary strings:
>
> set num 50
>
> for { set j 0 } { $j < $num } { incr j } {
> append row1 [binary format c 0]}
>
> puts "row1: length=[string length $row1] bytelength=[string bytelength
> $row1]"
>
> for { set j 0 } { $j < $num } { incr j } {
> append row2 [binary format c 1]}
>
> puts "row2: length=[string length $row2] bytelength=[string bytelength
> $row2]"
>
> I get the following output (tested with 8.4, 8.5, 8.6):
>
> row1: length=50 bytelength=100
> row2: length=50 bytelength=50
>
> Why do zero values occupy 2 bytes in a binary string?
>
> Regards,
>
> Paul

Slebetman is perfectly right ; may I ask why you need [string
bytelength] ? Ar you aware that it is *not* what you want event when
preparing a write to an utf-8 encoded output channel, since it is
basically measuring a "special flavour" of UTF-8 that is entirely
internal to Tcl ?

-Alex

From: slebetman on 10 Jan 2010 21:45

On Jan 11, 12:58 am, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com>
wrote:
> On Jan 10, 4:09 pm, "Paul(a)Tcl3D" <p...(a)tcl3d.org> wrote:
>
>
>
> > Hello,
>
> > I have a question regarding Tcl binary strings.
>
> > If running the following script creating two binary strings:
>
> > set num 50
>
> > for { set j 0 } { $j < $num } { incr j } {
> > append row1 [binary format c 0]}
>
> > puts "row1: length=[string length $row1] bytelength=[string bytelength
> > $row1]"
>
> > for { set j 0 } { $j < $num } { incr j } {
> > append row2 [binary format c 1]}
>
> > puts "row2: length=[string length $row2] bytelength=[string bytelength
> > $row2]"
>
> > I get the following output (tested with 8.4, 8.5, 8.6):
>
> > row1: length=50 bytelength=100
> > row2: length=50 bytelength=50
>
> > Why do zero values occupy 2 bytes in a binary string?
>
> > Regards,
>
> > Paul
>
> Slebetman is perfectly right ; may I ask why you need [string
> bytelength] ? Ar you aware that it is *not* what you want event when
> preparing a write to an utf-8 encoded output channel, since it is
> basically measuring a "special flavour" of UTF-8 that is entirely
> internal to Tcl ?
>

I think both the documentation and the error message generated when
calling [string bytelength] without arguments should state: DO NOT USE
THIS, see string length instead.

From: Arjen Markus on 11 Jan 2010 01:25

On 10 jan, 16:31, "slebet...(a)yahoo.com" <slebet...(a)gmail.com> wrote:
> On Jan 10, 11:09 pm, "Paul(a)Tcl3D" <p...(a)tcl3d.org> wrote:
>
>
>
>
>
> > Hello,
>
> > I have a question regarding Tcl binary strings.
>
> > If running the following script creating two binary strings:
>
> > set num 50
>
> > for { set j 0 } { $j < $num } { incr j } {
> > append row1 [binary format c 0]}
>
> > puts "row1: length=[string length $row1] bytelength=[string bytelength
> > $row1]"
>
> > for { set j 0 } { $j < $num } { incr j } {
> > append row2 [binary format c 1]}
>
> > puts "row2: length=[string length $row2] bytelength=[string bytelength
> > $row2]"
>
> > I get the following output (tested with 8.4, 8.5, 8.6):
>
> > row1: length=50 bytelength=100
> > row2: length=50 bytelength=50
>
> > Why do zero values occupy 2 bytes in a binary string?
>
> Don't use [string bytelength]. What you want is [string length].
>
> Because Tcl is implemented in C and because in C, strings are
> terminated by nul (0x00), the tcl interpreter internally encodes nuls
> as a special two-byte character. The [string bytelength] is really
> there mainly for debugging purposes or to workaround any possible edge
> cases not automatically handled by tcl. For everything else use
> [string length].

The reason is not so much that C uses NUL bytes to terminate
strings, but that Tcl uses UTF-8 internally. With "counted strings"
there is no need for this extra memory, but it is the UTF-8 encoding
of NUL bytes.

Regards,

Arjen

| Next | Last
Pages: 1 2
Prev: can androgel cause estrogen levels to increase
Next: Tkhtml2.0 source code for 64bit