From: Andrew Dunstan on

Peter Eisentraut wrote:
> But now we're back to the original problem. Certain editors insert BOMs
> at the beginning of the file. And that is by any definition before the
> embedded client encoding declaration. I think the only ways to solve
> this are:
> 1) Ignore the BOM if a client encoding declaration of UTF8 appears in a
> narrowly defined location near the beginning of the file (XML and
> PEP-0263 style). For *example*, we could ignore the BOM if the file
> starts with exactly "<BOM>\encoding UTF8\n". Would probably not work
> well in practice.
> 2) Parse two alternative versions of the file, one with the BOM ignored
> and one with the BOM not ignored, until you need to make a decision.
> Hilariously complicated, but would perhaps solve the problem.
> 3) Give up, do nothing.

4) set the client encoding before the file is read in any of the ways
that have already been discussed and then allow psql to eat the BOM.



Sent via pgsql-hackers mailing list (pgsql-hackers(a)
To make changes to your subscription:

From: Peter Eisentraut on
On ons, 2009-11-18 at 08:52 -0500, Andrew Dunstan wrote:
> 4) set the client encoding before the file is read in any of the ways
> that have already been discussed and then allow psql to eat the BOM.

This is certainly a workaround, just like piping the file through a
suitable sed expression would be, but conceptually, the client encoding
is a property of the file and should therefore be marked in the file.

Sent via pgsql-hackers mailing list (pgsql-hackers(a)
To make changes to your subscription:

From: Tom Lane on
Peter Eisentraut <peter_e(a)> writes:
> This is certainly a workaround, just like piping the file through a
> suitable sed expression would be, but conceptually, the client encoding
> is a property of the file and should therefore be marked in the file.

In a perfect world things would be like that, but the world is
imperfect. When only one of the available encodings even pretends
to have a marking convention, and even that one convention is broken,
imagining that you can fix it is just a recipe for making things worse.

regards, tom lane

Sent via pgsql-hackers mailing list (pgsql-hackers(a)
To make changes to your subscription:

From: Peter Eisentraut on
On mån, 2009-11-16 at 22:37 +0200, Peter Eisentraut wrote:
> On ons, 2009-10-21 at 13:11 +0900, Itagaki Takahiro wrote:
> > Sure. Client encoding is declared in body of a file, but BOM is
> > in head of the file. So, we should always ignore BOM sequence
> > at the file head no matter what client encoding is used.
> >
> > The attached patch replace BOM with while spaces, but it does not
> > change client encoding automatically. I think we can always ignore
> > client encoding at the replacement because SQL command cannot start
> > with BOM sequence. If we don't ignore the sequence, execution of
> > the script must fail with syntax error.
> OK, I think the consensus here is:
> - Eat BOM at beginning of file (as you implemented)
> - Only when client encoding is UTF-8 --> please fix that
> I'm not sure if replacing a BOM by three spaces is a good way to
> implement "eating", because it might throw off a column indicator
> somewhere, say, but I couldn't reproduce a problem. Note that the U
> +FEFF character is defined as *zero-width* non-breaking space.

I have committed a change that implements the above.

Sent via pgsql-hackers mailing list (pgsql-hackers(a)
To make changes to your subscription: