From: Pavel Stehule on
Hello

I have a one idea nonstandard enhancing of sprintf - relatie often job
is a quoting in PostgreSQL. So sprintf should have a special formats
for quoted values. What do you think about

%lq ... literal quoted
%iq ... ident quoted

??

Regards

Pavel

2010/7/13 Pavel Stehule <pavel.stehule(a)gmail.com>:
> Hello
>
> 2010/7/13 Itagaki Takahiro <itagaki.takahiro(a)gmail.com>:
>> 2010/7/13 Pavel Stehule <pavel.stehule(a)gmail.com>:
>>> so this is actualised patch:
>>> * concat_sql removed
>>> * left, right, reverse and concat are in core
>>> * printf and concat_ws are in contrib
>>> * format show "<NULL>" as NULL string
>>> * removed an using of wide chars
>>
>> I think function codes in the core (concat, format, left, right,
>> and reverse) are ready for committers. They also have docs, but
>> the names are not listed in Index page (bookindex.html).
>> Please add
>>   <indexterm>
>>    <primary>funcname</primary>
>>   </indexterm>
>> in func.sgml for each new function.
>>
>
> fixed
>> However, I have a couple of comments to stringfunc module. sprintf()
>> and concat_ws() are not installed by default, but provided by the module.
>>
>>> todo:
>>> NULL handling for printf function
>>
>> I like <NULL> for null arguments. It is just same as format() and RAISE.
>
> done
>
>>
>> === Questions ===
>> * concat_ws() transforms NULLs into empty strings.
>> Is it an intended behavior and compatible with MySQL?
>> Note that string_agg() doesn't add separators to NULLs.
>>
>
> no I was  wrong - original concat_ws just ignore NULL - fixed, now
> concat_ws has same behave like original.
>
>>  =# SELECT coalesce(concat_ws(',', 'A', NULL, 'B'), '(null)');
>>   coalesce
>>  ----------
>>   A,,B
>>  (1 row)
>>
>> * concat_ws() returns NULL when the separator is NULL.
>> Is it an intended behavior and compatible with MySQL?
>>
>>  =# SELECT coalesce(concat_ws(NULL, 'A', NULL, 'B'), '(null)');
>>   coalesce
>>  ----------
>>   (null)
>>  (1 row)
>>
>> === Trivial issues ===
>> * Some function prototypes are declared but not used.
>>  We can just remove them.
>>  - mb_string_info()
>>  - stringfunc_concat(PG_FUNCTION_ARGS);
>>  - stringfunc_left(PG_FUNCTION_ARGS);
>>  - stringfunc_right(PG_FUNCTION_ARGS);
>>  - stringfunc_reverse(PG_FUNCTION_ARGS);
>>
>> * Some error messages need to be improved.
>>  For example, "1th" is wrong.
>>    =# select sprintf('>>>%*s<<<', NULL, 'abcdef');
>>    ERROR:  null value not allowed
>>    HINT:  width (1th) arguments is NULL
>
> have you a some idea about it?
>
>>
>> * sprintf() has some typos in error messages
>>  For example, "sprinf".
>>
>
> fixed
>
>> --
>> Itagaki Takahiro
>>
>
> Regards
>
> Pavel
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Itagaki Takahiro on
I reviewed the core changes of the patch. I don't think we need
mb_string_info() at all. Instead, we can just call pg_mbxxx() functions.

I rewrote the patch to use pg_mbstrlen_with_len() and pg_mbcharcliplen().
What do you think the changes? It requires re-counting lengths of multi-byte
strings in some cases, but the code will be much simpler and can avoid
allocating length buffers.

I'd like to apply contrib/stringinfo apart from the core changes,
because there seems to be still some idea to improve sprintf().

--
Itagaki Takahiro
From: Pavel Stehule on
2010/7/21 Itagaki Takahiro <itagaki.takahiro(a)gmail.com>:
> I reviewed the core changes of the patch. I don't think we need
> mb_string_info() at all. Instead, we can just call pg_mbxxx() functions.
>
> I rewrote the patch to use pg_mbstrlen_with_len() and pg_mbcharcliplen().
> What do you think the changes? It requires re-counting lengths of multi-byte
> strings in some cases, but the code will be much simpler and can avoid
> allocating length buffers.
>

It is a good idea. I see a problem only for "right" function, where
for most common use case a mblen will be called two times. I am not
able to say now, if this can be a performance issue or not. Highly
probably not - only for very large strings.

postgres=# create or replace function randomstr(int) returns text as
$$select string_agg(substring('abcdefghijklmnop' from
trunc(random()*13)::int+1 for 1),'') from generate_series(1,$1) $$
language sql;
CREATE FUNCTION
Time: 27,452 ms

postgres=# select count(*) from(select right(randomstr(1000),3) from
generate_series(1,10000))x;
count
-------
10000
(1 row)

Time: 5615,061 ms
postgres=# select count(*) from(select right(randomstr(1000),3) from
generate_series(1,10000))x;
count
-------
10000
(1 row)

Time: 5606,937 ms
postgres=# select count(*) from(select right(randomstr(1000),3) from
generate_series(1,10000))x;
count
-------
10000
(1 row)

Time: 5630,771 ms

postgres=# select count(*) from(select right(randomstr(1000),3) from
generate_series(1,10000))x;
count
-------
10000
(1 row)

Time: 5753,063 ms
postgres=# select count(*) from(select right(randomstr(1000),3) from
generate_series(1,10000))x;
count
-------
10000
(1 row)
Time: 5755,776 ms

It is about 2% slower for UTF8 encoding. So it isn't significant for me.

I agree with your changes. Thank You very much

Regards

Pavel Stehule

> I'd like to apply contrib/stringinfo apart from the core changes,
> because there seems to be still some idea to improve sprintf().
>
> --
> Itagaki Takahiro
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Itagaki Takahiro on
2010/7/21 Pavel Stehule <pavel.stehule(a)gmail.com>:
> It is about 2% slower for UTF8 encoding. So it isn't significant for me.
> I agree with your changes. Thank You very much

Thanks. The core-part is almost ready to commit.
I'll continue to review the contrib part.

But I found there is a design issue in format() :
Appending a '%' is common use-case, but format() cannot append
% char without any spaces between placeholder and the raw % char.

itagaki=# SELECT format('%%%', 10), format('% %%', 10);
format | format
--------+--------
%10 | 10 %
(1 row)

It is a design issue, and RAISE in PL/pgSQL has the same issue, too.
Do we accept the restriction? Or should we add another escape
syntax and/or placeholder pattern?

--
Itagaki Takahiro

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Itagaki Takahiro on
I'm reviewing contrib part of the string functions patch.

I found an issue in sprintf() to print integer values. In this case,
'l' (for long type) is used on *all* platforms. For example,
SELECT sprintf('%d', 10);
internally uses
appendStringInfo('%ld', (int64) 10)

But there are some platform that requires to use %lld for int64 format, probably
on Windows. That's why we have INT64_FORMAT macro. sprintf() needs to be
adjusted to use INT64_FORMAT or similar portable codes.

Other portion of the patch seems to be OK for me,
unless you have still some idea to extend the feature.

2010/7/17 Pavel Stehule <pavel.stehule(a)gmail.com>:
> I have a one idea nonstandard enhancing of sprintf - relatie often job
> is a quoting in PostgreSQL. So sprintf should have a special formats
> for quoted values. What do you think about
>
> %lq ... literal quoted
> %iq ... ident quoted

They save some keyboard types to write quote_literal() and quote_ident(), right?
They seem to be useful and reasonable for me. One comment is that you might
want to print NULL values as "NULL" instead of "<NULL>" in such cases.

--
Itagaki Takahiro

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers