From: Paul on
Hello,

I have a question regarding Tcl binary strings.

If running the following script creating two binary strings:

set num 50

for { set j 0 } { $j < $num } { incr j } {
append row1 [binary format c 0]
}
puts "row1: length=[string length $row1] bytelength=[string bytelength
$row1]"

for { set j 0 } { $j < $num } { incr j } {
append row2 [binary format c 1]
}
puts "row2: length=[string length $row2] bytelength=[string bytelength
$row2]"


I get the following output (tested with 8.4, 8.5, 8.6):

row1: length=50 bytelength=100
row2: length=50 bytelength=50

Why do zero values occupy 2 bytes in a binary string?

Regards,

Paul
From: slebetman on
On Jan 10, 11:09 pm, "Paul(a)Tcl3D" <p...(a)tcl3d.org> wrote:
> Hello,
>
> I have a question regarding Tcl binary strings.
>
> If running the following script creating two binary strings:
>
> set num 50
>
> for { set j 0 } { $j < $num } { incr j } {
>      append row1 [binary format c 0]}
>
> puts "row1: length=[string length $row1] bytelength=[string bytelength
> $row1]"
>
> for { set j 0 } { $j < $num } { incr j } {
>      append row2 [binary format c 1]}
>
> puts "row2: length=[string length $row2] bytelength=[string bytelength
> $row2]"
>
> I get the following output (tested with 8.4, 8.5, 8.6):
>
> row1: length=50 bytelength=100
> row2: length=50 bytelength=50
>
> Why do zero values occupy 2 bytes in a binary string?
>

Don't use [string bytelength]. What you want is [string length].

Because Tcl is implemented in C and because in C, strings are
terminated by nul (0x00), the tcl interpreter internally encodes nuls
as a special two-byte character. The [string bytelength] is really
there mainly for debugging purposes or to workaround any possible edge
cases not automatically handled by tcl. For everything else use
[string length].
From: Alexandre Ferrieux on
On Jan 10, 4:09 pm, "Paul(a)Tcl3D" <p...(a)tcl3d.org> wrote:
> Hello,
>
> I have a question regarding Tcl binary strings.
>
> If running the following script creating two binary strings:
>
> set num 50
>
> for { set j 0 } { $j < $num } { incr j } {
>      append row1 [binary format c 0]}
>
> puts "row1: length=[string length $row1] bytelength=[string bytelength
> $row1]"
>
> for { set j 0 } { $j < $num } { incr j } {
>      append row2 [binary format c 1]}
>
> puts "row2: length=[string length $row2] bytelength=[string bytelength
> $row2]"
>
> I get the following output (tested with 8.4, 8.5, 8.6):
>
> row1: length=50 bytelength=100
> row2: length=50 bytelength=50
>
> Why do zero values occupy 2 bytes in a binary string?
>
> Regards,
>
> Paul

Slebetman is perfectly right ; may I ask why you need [string
bytelength] ? Ar you aware that it is *not* what you want event when
preparing a write to an utf-8 encoded output channel, since it is
basically measuring a "special flavour" of UTF-8 that is entirely
internal to Tcl ?

-Alex
From: slebetman on
On Jan 11, 12:58 am, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com>
wrote:
> On Jan 10, 4:09 pm, "Paul(a)Tcl3D" <p...(a)tcl3d.org> wrote:
>
>
>
> > Hello,
>
> > I have a question regarding Tcl binary strings.
>
> > If running the following script creating two binary strings:
>
> > set num 50
>
> > for { set j 0 } { $j < $num } { incr j } {
> >      append row1 [binary format c 0]}
>
> > puts "row1: length=[string length $row1] bytelength=[string bytelength
> > $row1]"
>
> > for { set j 0 } { $j < $num } { incr j } {
> >      append row2 [binary format c 1]}
>
> > puts "row2: length=[string length $row2] bytelength=[string bytelength
> > $row2]"
>
> > I get the following output (tested with 8.4, 8.5, 8.6):
>
> > row1: length=50 bytelength=100
> > row2: length=50 bytelength=50
>
> > Why do zero values occupy 2 bytes in a binary string?
>
> > Regards,
>
> > Paul
>
> Slebetman is perfectly right ; may I ask why you need [string
> bytelength] ? Ar you aware that it is *not* what you want event when
> preparing a write to an utf-8 encoded output channel, since it is
> basically measuring a "special flavour" of UTF-8 that is entirely
> internal to Tcl ?
>

I think both the documentation and the error message generated when
calling [string bytelength] without arguments should state: DO NOT USE
THIS, see string length instead.
From: Arjen Markus on
On 10 jan, 16:31, "slebet...(a)yahoo.com" <slebet...(a)gmail.com> wrote:
> On Jan 10, 11:09 pm, "Paul(a)Tcl3D" <p...(a)tcl3d.org> wrote:
>
>
>
>
>
> > Hello,
>
> > I have a question regarding Tcl binary strings.
>
> > If running the following script creating two binary strings:
>
> > set num 50
>
> > for { set j 0 } { $j < $num } { incr j } {
> >      append row1 [binary format c 0]}
>
> > puts "row1: length=[string length $row1] bytelength=[string bytelength
> > $row1]"
>
> > for { set j 0 } { $j < $num } { incr j } {
> >      append row2 [binary format c 1]}
>
> > puts "row2: length=[string length $row2] bytelength=[string bytelength
> > $row2]"
>
> > I get the following output (tested with 8.4, 8.5, 8.6):
>
> > row1: length=50 bytelength=100
> > row2: length=50 bytelength=50
>
> > Why do zero values occupy 2 bytes in a binary string?
>
> Don't use [string bytelength]. What you want is [string length].
>
> Because Tcl is implemented in C and because in C, strings are
> terminated by nul (0x00), the tcl interpreter internally encodes nuls
> as a special two-byte character. The [string bytelength] is really
> there mainly for debugging purposes or to workaround any possible edge
> cases not automatically handled by tcl. For everything else use
> [string length].

The reason is not so much that C uses NUL bytes to terminate
strings, but that Tcl uses UTF-8 internally. With "counted strings"
there is no need for this extra memory, but it is the UTF-8 encoding
of NUL bytes.

Regards,

Arjen