From: Alexandre Ferrieux on
On Jan 14, 3:50 pm, "MartinLemburg(a)Siemens-PLM"
<martin.lemburg.siemens-...(a)gmx.net> wrote:
> Hi,
>
> I know ... 1000s of times it is said ... don't care, while scripting
> in tcl, about the internal type of the data you are working with.
>
> But ... I care, because handling bigger structured data (dicts with
> nesting dicts, lists (containing dicts, ...), ...) it is not that
> nice, if shimmering occurs.
> And if producing e.g. a report of a complex, high precision
> calculation and its parameters and suddenly all the numerical data
> lost its numerical representation? For me a "Ooh NOO".
>
> So what's about this:
>
>     % info patchlevel
>     8.6b1.1
>     % proc objtype {arg} {return [string map {"pure" "string"} [lindex
> [split [::tcl::unsupported::representation $arg]] 3]];}
>     % set d [dict create a 1 b 2 c 3]
>     a 1 b 2 c 3
>     % set d2 $d
>     a 1 b 2 c 3
>     % set l [lrepeat 3 $d]
>     {a 1 b 2 c 3} {a 1 b 2 c 3} {a 1 b 2 c 3}
>     % objtype $d
>     dict
>     % objtype $d2
>     dict
>     % objtype $l
>     list
>     % objtype [lindex $l 0]
>     dict
>     % format {%30s} $d; # expecting that the string representation of
> d will be formatted, ...
>                        a 1 b 2 c 3
>     % objtype $d; # ... but the data itself is converted to a string
> for formatting
>     string
>     % objtype $d2; # even the connection from d to d2 was not cut,
> while converting to a string
>     string
>     % objtype [lindex $l 0]; # even the values in the list are not
> "duplicated" to save their internal type
>     string
>     % set d [dict create {*}$d]
>     a 1 b 2 c 3
>     % objtype $d
>     dict
>     % objtype $d2; # wow, the connection from d to d2 still existed
> while expanding d
>     list
>
> So, why "format" converts the data to be formatted instead of using
> the string representation?
>
> Using "format" I never expected that the data I want to format will
> shimmer!
> I expected the result to be a string, not the format argument to be
> converted to a string.

The reason is that there are two kinds of "string": the "string rep",
which does coexist with any internal rep, and is basically a modified-
UTF-8 string with a terminating \0, and the String internal rep, which
is an Unicode string. As it is an internal rep, the String obviously
erases whatever previous intrep was there.

So it all boils down to Tcl_Format requesting its arguments facing
'%s' to be first converted to a String object. This happens here:
tclStringObj.c, line 1875:

numChars = Tcl_GetCharLength(segment);

As you can guess, the reason is that the whole [format] concatenation
is done on such objects (in unicode). This in turn, I guess, is due to
field width specifiers which are expressed in characters, not bytes;
Unicode is the realm of character-counting...

As a workaround, just don't use [format %s], just use EIAS "naked":

set d {1 2 3 4}
=> 1 2 3 4
dict get $d 1
=> 2
puts [format %d 1]$d[format %d 2]
=> 11 2 3 42
::tcl::unsupported::representation $d
=> value is a dict with a refcount of 4, object pointer at
0x97236f8, internal representation 0x9733738:0x97236e0, string
representation "1 2 3 4".

-Alex

From: Andreas Leitgeb on
MartinLemburg(a)Siemens-PLM <martin.lemburg.siemens-plm(a)gmx.net> wrote:
> so "format {%s} ..." causes the generation of a string representation,
> where none is found, and replaces the original internal object
> representation by the new generated string representation?

Now, I think I understand, and I agree with you, that just querying
a string-rep of an object should not "zap" the original rep. But
does it really? Why is there a difference between "pure string"
and "string" in the output of [rep $d]?

% set d [expr double(1)] ;# -> 1.0
% rep $d
value is a double with a refcount of 4, object pointer at 0x9c07f08,
internal representation (nil):0x3ff00000, string representation "1.0".

% set d [expr double(1)]; rep $d
value is a double with a refcount of 3, object pointer at 0x9c2b238,
internal representation (nil):0x3ff00000, no string representation.

% set d [expr double(1)]; format %s $d; rep $d
value is a string with a refcount of 3, object pointer at 0x9c2b928,
internal representation 0x9c253c8:0x3ff00000, string representation "1.0".

This was strange, as you (imho) rightly complained: why make string the
new primary rep? But the double is probably still there, as well

% set d [expr double(1)]; set d "$d "; rep $d
value is a pure string with a refcount of 3, object pointer at 0x9c2b760,
string representation "1.0 ".

Now, this time it was zapped, but thats fine here.

% set d [expr double(1)]; append d "x"; rep $d
value is a string with a refcount of 3, object pointer at 0x9c24870,
internal representation 0x9c11cc8:0x3ff00000, string representation "1.0x".

I'd really want to know, what the old internal rep turned to here...;
why doesn't rep name it a "pure string" now, and (nil)ify the secend rep?
One really cannot use $d in a numeric expr-operation now (tried it).
Glitch in "::tcl::unsupported::representation" or elsewhere?

% set d [expr double(1)]; lappend d x; rep $d
value is a list with a refcount of 3, object pointer at 0x9c24690,
internal representation 0x9c2a9b0:(nil), no string representation.

Here, the double rep is really gone.

PS: Don't mind the refcounts. I've got Alex's original lastresult patch
compiled in (which accounts for the "4" in my first example) and I don't
know why it's 3 rather than 2 in the other cases.

PPS: info patchlevel == 8.6b1.1 (last updated last week, or so).
From: Alexandre Ferrieux on
On Jan 14, 7:09 pm, Andreas Leitgeb <a...(a)gamma.logic.tuwien.ac.at>
wrote:
>
> [...] that just querying
> a string-rep of an object should not "zap" the original rep. But
> does it really?  Why is there a difference between "pure string"
> and "string" in the output of [rep $d]?

[rep] just describes the truth, he's innocent :}
See my other post: when rep says "string" it means
typePtr==&tclStringType (no invention, the "name" field of
tclStringType is "string"). When it says "pure string" it means
typePtr==NULL.

So, the only things that "zaps" a previous internal rep is _forcing_
to tclStringType.
And it happens within [format %s] as explained in my post.

Now after digging a bit in the code it appears that:

- forcing to tclStringType is indeed useful to count chars
- the actual concatenation is done by Tcl_AppendObjToObj, which for
all types except two (String and ByteArray-without-string-rep) works
at the string-rep level, hence is not responsible for shimmering.

As a conclusion I'd say that the String-forcing in [format] is (1)
useless in the absence of width specifiers, and (2) might be avoided
in all cases, by using (slightly slower) UTF-8-char-counting
functions.

I'll open a low-prio bug for this. Thanks Martin for unearthing it.

> This was strange, as you (imho) rightly complained: why make string the
> new primary rep? But the double is probably still there, as well

Nope. Just one internal rep at any given time. When it's string the
double is gone.
(If it were "pure string" also, since pure string means a null type
pointer)

> % set d [expr double(1)]; append d "x"; rep $d
> value is a string with a refcount of 3, object pointer at 0x9c24870,
> internal representation 0x9c11cc8:0x3ff00000, string representation "1.0x".
>
> I'd really want to know, what the old internal rep turned to here...;
> why doesn't rep name it a "pure string" now, and (nil)ify the secend rep?

Apparently [append] suffers from the same suboptimaliy. Will track it
in the same bug report, thanks.

> One really cannot use $d in a numeric expr-operation now (tried it).
> Glitch in "::tcl::unsupported::representation" or elsewhere?

No. 1.0x is hard on anybody's math ;-)
And I don't see why it should be rep's fault that "1.0x" doesn't
cooperate with [expr]...

> PS: Don't mind the refcounts.  I've got Alex's original lastresult patch
>   compiled in (which accounts for the "4" in my first example) and I don't
>   know why it's 3 rather than 2 in the other cases.

Flattered :)

3 == 1 in global var d, 2 in proc unknown's handling of "rep".
If you want to see a "1" as a refcount, I'd suggest:
- avoiding shortcuts that call [unknown]
- avoiding aliases
- avoiding variables
- avoiding [history lastresult] by appending ";set foo 1" to all
lines

-Alex

From: Alexandre Ferrieux on
On Jan 14, 11:42 pm, I wrote:
>
> I'll open a low-prio bug for this. Thanks Martin for unearthing it.
>

Done as https://sourceforge.net/tracker/?func=detail&aid=2932421&group_id=10894&atid=110894

-Alex
From: Andreas Leitgeb on
Alexandre Ferrieux <alexandre.ferrieux(a)gmail.com> wrote:
> See my other post: when rep says "string" it means
> typePtr==&tclStringType

Yes, sorry, I saw that only after writing mine.

Most of my comments were kind of voided then, but the one about
append is still strange to my understanding.

>> % set d [expr double(1)]; append d "x"; rep $d
>> value is a string with a refcount of 3, object pointer at 0x9c24870,
>> internal representation 0x9c11cc8:0x3ff00000, string representation "1.0x".
>>
>> I'd really want to know, what the old internal rep turned to here...;
>> why doesn't rep name it a "pure string" now, and (nil)ify the secend rep?
>
> Apparently [append] suffers from the same suboptimaliy. Will track it
> in the same bug report, thanks.

But it's still all different from the format-case.
I'd have expected "append" to destroy all but any of the two string-reps,
So, in any case, I'd have expected the ":0x3ff00000" to disappear.
That this ":0x3ff00000" was still there was the reason why I even made
that goofy expr-test on it. Why is it *not* replaced by ":(nil)", as it
happens with other operations that eventually thwart the previous type,
like "lappend" ?

% set d [expr {atan(1)*4}]; lappend d x; rep $d
value is a list with a refcount of 3, object pointer at 0x86528f8,
internal representation 0x8651848:(nil), no string representation.

>> PS: Don't mind the refcounts.  I've got Alex's original lastresult patch
>>   compiled in (which accounts for the "4" in my first example) and I don't
>>   know why it's 3 rather than 2 in the other cases.
> Flattered :)

And I didn't even see that you were participating in this thread
when I wrote it :)

> If you want to see a "1" as a refcount, I'd suggest:
> - ...
> - avoiding aliases
Wasn't aware that aliases kept another ref, so now it's clear to me.

Thanks! (also for the bugreport)