Replace and inserting strings within .txt files with the use of regex [Python]

Prev: Need Translation library
Next: Replace and inserting strings within .txt files with the useof regex

From: Peter Otten on 9 Aug 2010 04:45

Νίκος wrote:

> On 9 Αύγ, 10:38, Peter Otten <__pete...(a)web.de> wrote:
>> Νίκος wrote:
>> > Now the code looks as follows:
>> > for currdir, files, dirs in os.walk('test'):
>>
>> > for f in files:
>>
>> > if f.endswith('php'):
>>
>> > # get abs path to filename
>> > src_f = join(currdir, f)
>> > I just tried to test it. I created a folder names 'test' in me 'd:\'
>> > drive.
>> > Then i have put to .php files inside form the original to test if it
>> > would work ok for those too files before acting in the whole copy and
>> > after in the original project.
>>
>> > so i opened a 'cli' form my Win7 and tried
>>
>> > D:\>convert.py
>>
>> > D:\>
>>
>> > Itsjust printed an empty line and nothign else. Why didn't even try to
>> > open the folder and fiels within?
>> > Syntactically it doesnt ghive me an error!
>> > Somehting with os.walk() methos perhaps?
>>
>> If there is a folder D:\test and it does contain some PHP files (double-
>> check!) the extension could be upper-case. Try
>>
>> if f.lower().endswith("php"): ...
>>
>> or
>>
>> php_files = fnmatch.filter(files, "*.php")
>> for f in php_files: ...
>>
>> Peter
>
> The extension is in in lower case. folder is there, php files is
> there, i dont know why it doesnt't want to go into the d:\test to find
> them.
>
> Thast one problem.
>
> The other one is:
>
> i made the code simpler by specifying the filename my self.
>
> =========================
> # get abs path to filename
> src_f = 'd:\\test\\index.php'
>
> # open php src file
> print ( 'reading from %s' % src_f )
> f = open(src_f, 'r')
> src_data = f.read() # read contents of PHP file
> f.close()
> =========================
>
> but although ti nwo finds the fiel i egt this error in 'cli':
>
> D:\>aconvert.py
> reading from d:\test\index.php
> Traceback (most recent call last):
> File "D:\aconvert.py", line 16, in <module>
> src_data = f.read() # read contents of PHP file
> File "C:\Python32\lib\encodings\cp1253.py", line 23, in decode
> return codecs.charmap_decode(input,self.errors,decoding_table)[0]
> UnicodeDecodeError: 'charmap' codec can't decode byte 0x9f in position
> 321: char
> acter maps to <undefined>
>
> Somethign with the damn encodings again!!

Hmm, at one point in this thread you switched from Python 2.x to Python 3.2.
There are a lot of subtle and not so subtle differences between 2.x and 3.x,
and I recommend that you stick to one while you are still in newbie mode.

If you want to continue to use 3.x I recommend that you at least use the
stable 3.1 version.

Now one change from Python 2 to 3 is that open(filename, "r") gives you a
beast that is unicode-aware and assumes that the file is encoded in utf-8
unless you tell it otherwise with open(..., encoding=whatever). So what is
the charset used for your index.php?

Peter

From: Νίκος on 9 Aug 2010 05:46

On 9 ÎÏÎ³, 11:45, Peter Otten <__pete...(a)web.de> wrote:
> ÎÎ¯ÎºÎ¿Ï wrote:
> > On 9 ÎÏÎ³, 10:38, Peter Otten <__pete...(a)web.de> wrote:
> >> ÎÎ¯ÎºÎ¿Ï wrote:
> >> > Now the code looks as follows:
> >> > for currdir, files, dirs in os.walk('test'):
>
> >> > for f in files:
>
> >> > if f.endswith('php'):
>
> >> > # get abs path to filename
> >> > src_f = join(currdir, f)
> >> > I just tried to test it. I created a folder names 'test' in me 'd:\'
> >> > drive.
> >> > Then i have put to .php files inside form the original to test if it
> >> > would work ok for those too files before acting in the whole copy and
> >> > after in the original project.
>
> >> > so i opened a 'cli' form my Win7 and tried
>
> >> > D:\>convert.py
>
> >> > D:\>
>
> >> > Itsjust printed an empty line and nothign else. Why didn't even try to
> >> > open the folder and fiels within?
> >> > Syntactically it doesnt ghive me an error!
> >> > Somehting with os.walk() methos perhaps?
>
> >> If there is a folder D:\test and it does contain some PHP files (double-
> >> check!) the extension could be upper-case. Try
>
> >> if f.lower().endswith("php"): ...
>
> >> or
>
> >> php_files = fnmatch.filter(files, "*.php")
> >> for f in php_files: ...
>
> >> Peter
>
> > The extension is in in lower case. folder is there, php files is
> > there, i dont know why it doesnt't want to go into the d:\test to find
> > them.
>
> > Thast one problem.
>
> > The other one is:
>
> > i made the code simpler by specifying the filename my self.
>
> > =========================
> > # get abs path to filename
> > src_f = 'd:\\test\\index.php'
>
> > # open php src file
> > print ( 'reading from %s' % src_f )
> > f = open(src_f, 'r')
> > src_data = f.read() Â Â Â Â Â Â Â Â # read contents of PHP file
> > f.close()
> > =========================
>
> > but Â although ti nwo finds the fiel i egt this error in 'cli':
>
> > D:\>aconvert.py
> > reading from d:\test\index.php
> > Traceback (most recent call last):
> > Â File "D:\aconvert.py", line 16, in <module>
> > Â Â src_data = f.read() Â Â Â Â # read contents of PHP file
> > Â File "C:\Python32\lib\encodings\cp1253.py", line 23, in decode
> > Â Â return codecs.charmap_decode(input,self.errors,decoding_table)[0]
> > UnicodeDecodeError: 'charmap' codec can't decode byte 0x9f in position
> > 321: char
> > acter maps to <undefined>
>
> > Somethign with the damn encodings again!!
>
> Hmm, at one point in this thread you switched from Python 2.x to Python 3..2.
> There are a lot of subtle and not so subtle differences between 2.x and 3..x,
> and I recommend that you stick to one while you are still in newbie mode.
>
> If you want to continue to use 3.x I recommend that you at least use the
> stable 3.1 version.
>
> Now one change from Python 2 to 3 is that open(filename, "r") gives you a
> beast that is unicode-aware and assumes that the file is encoded in utf-8
> unless you tell it otherwise with open(..., encoding=whatever). So what is
> the charset used for your index.php?
>
> Peter

Yes yesterday i switched to Python 3.2 Peter.

When i open index.php within Notapad++ it says its in utf-8 without
BOM and it contains inside exepect form english chars , greek cjhars
as well fro printing.

The file was made by my client in dreamweaver.

So since its utf-8 what the problem of opening it?

From: Peter Otten on 9 Aug 2010 06:06

Νίκος wrote:

> On 9 Αύγ, 11:45, Peter Otten <__pete...(a)web.de> wrote:
>> Νίκος wrote:
>> > On 9 Αύγ, 10:38, Peter Otten <__pete...(a)web.de> wrote:
>> >> Νίκος wrote:
>> >> > Now the code looks as follows:
>> >> > for currdir, files, dirs in os.walk('test'):
>>
>> >> > for f in files:
>>
>> >> > if f.endswith('php'):
>>
>> >> > # get abs path to filename
>> >> > src_f = join(currdir, f)
>> >> > I just tried to test it. I created a folder names 'test' in me 'd:\'
>> >> > drive.
>> >> > Then i have put to .php files inside form the original to test if it
>> >> > would work ok for those too files before acting in the whole copy
>> >> > and after in the original project.
>>
>> >> > so i opened a 'cli' form my Win7 and tried
>>
>> >> > D:\>convert.py
>>
>> >> > D:\>
>>
>> >> > Itsjust printed an empty line and nothign else. Why didn't even try
>> >> > to open the folder and fiels within?
>> >> > Syntactically it doesnt ghive me an error!
>> >> > Somehting with os.walk() methos perhaps?
>>
>> >> If there is a folder D:\test and it does contain some PHP files
>> >> (double- check!) the extension could be upper-case. Try
>>
>> >> if f.lower().endswith("php"): ...
>>
>> >> or
>>
>> >> php_files = fnmatch.filter(files, "*.php")
>> >> for f in php_files: ...
>>
>> >> Peter
>>
>> > The extension is in in lower case. folder is there, php files is
>> > there, i dont know why it doesnt't want to go into the d:\test to find
>> > them.
>>
>> > Thast one problem.
>>
>> > The other one is:
>>
>> > i made the code simpler by specifying the filename my self.
>>
>> > =========================
>> > # get abs path to filename
>> > src_f = 'd:\\test\\index.php'
>>
>> > # open php src file
>> > print ( 'reading from %s' % src_f )
>> > f = open(src_f, 'r')
>> > src_data = f.read() # read contents of PHP file
>> > f.close()
>> > =========================
>>
>> > but although ti nwo finds the fiel i egt this error in 'cli':
>>
>> > D:\>aconvert.py
>> > reading from d:\test\index.php
>> > Traceback (most recent call last):
>> > File "D:\aconvert.py", line 16, in <module>
>> > src_data = f.read() # read contents of PHP file
>> > File "C:\Python32\lib\encodings\cp1253.py", line 23, in decode
>> > return codecs.charmap_decode(input,self.errors,decoding_table)[0]
>> > UnicodeDecodeError: 'charmap' codec can't decode byte 0x9f in position
>> > 321: char
>> > acter maps to <undefined>
>>
>> > Somethign with the damn encodings again!!
>>
>> Hmm, at one point in this thread you switched from Python 2.x to Python
>> 3.2. There are a lot of subtle and not so subtle differences between 2.x
>> and 3.x, and I recommend that you stick to one while you are still in
>> newbie mode.
>>
>> If you want to continue to use 3.x I recommend that you at least use the
>> stable 3.1 version.
>>
>> Now one change from Python 2 to 3 is that open(filename, "r") gives you a
>> beast that is unicode-aware and assumes that the file is encoded in utf-8
>> unless you tell it otherwise with open(..., encoding=whatever). So what
>> is the charset used for your index.php?
>>
>> Peter
>
>
> Yes yesterday i switched to Python 3.2 Peter.
>
> When i open index.php within Notapad++ it says its in utf-8 without
> BOM and it contains inside exepect form english chars , greek cjhars
> as well fro printing.
>
> The file was made by my client in dreamweaver.
>
> So since its utf-8 what the problem of opening it?

Python says it's not, and I tend to believe it. You can open the file with

open(..., errors="replace")

but you will lose data (which is already garbled, anyway).

Again: in the unlikely case that Python is causing your problem -- you do
understand what an alpha version is?

Peter

From: Νίκος on 9 Aug 2010 06:34

On 9 ÎÏÎ³, 13:06, Peter Otten <__pete...(a)web.de> wrote:

> > So since its utf-8 what the problem of opening it?
>
> Python says it's not, and I tend to believe it.

You are right!

I tried to do the same exact openign via IDLE enviroment and i goth
the encoding of the file from there!

>>> open("d:\\test\\index.php" ,'r')
<_io.TextIOWrapper name='d:\\test\\index.php' encoding='cp1253'>

Thats why in the error in my previous post it said
File "C:\Python32\lib\encodings\cp1253.py", line 23, in decode
it tried to use the cp1253 encoding.

But now sicne Python as we see can undestand the nature of the
encoding what causing it not to open the file?

From: Peter Otten on 9 Aug 2010 06:47

Νίκος wrote:

> On 9 Αύγ, 13:06, Peter Otten <__pete...(a)web.de> wrote:
>
>> > So since its utf-8 what the problem of opening it?
>>
>> Python says it's not, and I tend to believe it.
>
> You are right!
>
> I tried to do the same exact openign via IDLE enviroment and i goth
> the encoding of the file from there!
>
>>>> open("d:\\test\\index.php" ,'r')
> <_io.TextIOWrapper name='d:\\test\\index.php' encoding='cp1253'>
>
> Thats why in the error in my previous post it said
> File "C:\Python32\lib\encodings\cp1253.py", line 23, in decode
> it tried to use the cp1253 encoding.
>
> But now sicne Python as we see can undestand the nature of the
> encoding what causing it not to open the file?

It doesn't. You have to tell. *If* the file uses cp1253 you can open it with

open(..., encoding="cp1253")

Note that if the file is not in cp1253 python will still happily open it as
long as it doesn't contain the following bytes:

>>> for i in range(256):
.... try: chr(i).decode("cp1253") and None
.... except: print i
....
129
136
138
140
141
142
143
144
152
154
156
157
158
159
170
210
255

Peter

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: Need Translation library
Next: Replace and inserting strings within .txt files with the useof regex