Replace and inserting strings within .txt files with the use of regex [Python]

Prev: Need Translation library
Next: Replace and inserting strings within .txt files with the useof regex

From: Νίκος on 9 Aug 2010 10:17

On 9 ÎÏÎ³, 13:47, Peter Otten <__pete...(a)web.de> wrote:
> ÎÎ¯ÎºÎ¿Ï wrote:
> > On 9 ÎÏÎ³, 13:06, Peter Otten <__pete...(a)web.de> wrote:
>
> >> > So since its utf-8 what the problem of opening it?
>
> >> Python says it's not, and I tend to believe it.
>
> > You are right!
>
> > I tried to do the same exact openign via IDLE enviroment and i goth
> > the encoding of the file from there!
>
> >>>> open("d:\\test\\index.php" ,'r')
> > <_io.TextIOWrapper name='d:\\test\\index.php' encoding='cp1253'>
>
> > Thats why in the error in my previous post it said
> > File "C:\Python32\lib\encodings\cp1253.py", line 23, in decode
> > it tried to use the cp1253 encoding.
>
> > But now sicne Python as we see can undestand the nature of the
> > encoding what causing it not to open the file?
>
> It doesn't. You have to tell.

Why it doesn't? The idle response designates that it knows that file
encoding is in "cp1253" which means it can identify it.

*If* the file uses cp1253 you can open it with
>
> open(..., encoding="cp1253")
>
> Note that if the file is not in cp1253 python will still happily open it as
> long as it doesn't contain the following bytes:
>
> >>> for i in range(256):
>
> ... Â Â try: chr(i).decode("cp1253") and None
> ... Â Â except: print i
> ...
> 129
> 136
> 138
> 140
> 141
> 142
> 143
> 144
> 152
> 154
> 156
> 157
> 158
> 159
> 170
> 210
> 255
>
> Peter

I'm afraid it does because whn i tried:

f = open(src_f, 'r', encoding="cp1253" )

i got the same error again.....what are those characters?Dont they
belong too tot he same weird 'cp1253' encoding? Why compiler cant open
them?

From: Νίκος on 9 Aug 2010 11:58

Please tell me that no matter what weird charhs has inside ic an still
open thosie fiels and make the neccessary replacements.

From: Peter Otten on 9 Aug 2010 12:21

Νίκος wrote:

> Please tell me that no matter what weird charhs has inside ic an still
> open thosie fiels and make the neccessary replacements.

Go back to 2.6 for the moment and defer learning about unicode until you're
done with the conversion job.

From: Νίκος on 9 Aug 2010 13:40

On 9 ÎÏÎ³, 19:21, Peter Otten <__pete...(a)web.de> wrote:
> ÎÎ¯ÎºÎ¿Ï wrote:
> > Please tell me that no matter what weird charhs has inside ic an still
> > open thosie fiels and make the neccessary replacements.
>
> Go back to 2.6 for the moment and defer learning about unicode until you're
> done with the conversion job.

You are correct again! 3.2 caused the problem, i switched to 2.7 and
now i donyt have that problem anymore. File is openign okey!

it ALMOST convert correctly!

# replace tags
print ( 'replacing php tags and contents within' )
src_data = re.sub( '<\?(.*?)\?>', '', src_data )

it only convert the first instance of php tages and not the rest?
But why?

From: Νίκος on 9 Aug 2010 15:27

On 8 ÎÏÎ³, 20:29, John S <jstrick...(a)gmail.com> wrote:

> When replacing text in an HTML document with re.sub, you want to use
> the re.S (singleline) option; otherwise your pattern won't match when
> the opening tag is on one line and the closing is on another.

Thats exactly the problem iam facing now with this statement.

src_data = re.sub( '<\?(.*?)\?>', '', src_data )

you mean i have to switch it like this?

src_data = re.S ( '<\?(.*?)\?>', '', src_data ) ?

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: Need Translation library
Next: Replace and inserting strings within .txt files with the useof regex