Replace and inserting strings within .txt files with the use of regex [Python]

Prev: Replace and inserting strings within .txt files with the useof regex
Next: I need a starter ptr writing python embedded in html.

From: Νίκος on 7 Aug 2010 22:26

# rename ALL php files to html in every subfolder of the folder 'data'
os.rename('*.php', '*.html') # how to tell python to
rename ALL php files to html to ALL subfolder under 'data' ?

# current path of the file to be processed
path = './data' # this must be somehow in a loop i feel
that read every file of every subfolder

# open an html file for reading
f = open(path, 'rw')
# read the contents of the whole file
data = f.read()

# replace all php tags with empty string
data = data.replace('<?', '')
data = data.replace('?>', '')

# write replaced data to file
data = f.write()

# insert an increasing unique integer number at the very first line
of every html file processing
comment = ""%(idnum) # how will the number
change here an increased by one file after file?
f = f.close()

Please help i'm new to python an apart from syntx its a logic problem
as well and needs experience.

From: rantingrick on 7 Aug 2010 22:46

On Aug 7, 8:42 pm, MRAB <pyt...(a)mrabarnett.plus.com> wrote:

> That should be:
>
> data = data.replace('<?', '')
> data = data.replace('?>', '')

Yes, Thanks MRAB. I did forget that important detail.

> Strings don't have an 'insert' method!

*facepalm*! I really must stop Usenet-ing whilst consuming large
volumes of alcoholic beverages.

From: Νίκος on 8 Aug 2010 07:41

On 8 ÎÏÎ³, 13:13, Thomas Jollans <tho...(a)jollans.com> wrote:
> On 08/08/2010 11:21 AM, ÎÎ¯ÎºÎ¿Ï wrote:
>
> > Please help me adjust it, if need extra modification for more php tags
> > replacing.
>
> Have you tried it ? I haven't, but I see no immediate reason why it
> wouldn't work with multiple PHP blocks.
>
>
>
>
>
> > #!/usr/bin/python
>
> > import cgitb; cgitb.enable()
> > import cgi, re, os
>
> > print ( "Content-type: text/html; charset=UTF-8 \n" )
>
> > id = 0 Â # unique page_id
>
> > for currdir, files, dirs in os.walk('data'):
>
> > Â Â for f in files:
>
> > Â Â Â Â if f.endswith('php'):
>
> > Â Â Â Â Â Â # get abs path to filename
> > Â Â Â Â Â Â src_f = join(currdir,f)
>
> > Â Â Â Â Â Â # open php src file
> > Â Â Â Â Â Â f = open(src_f, 'r')
> > Â Â Â Â Â Â src_data = f.read() Â Â Â Â # read contents of PHP file
> > Â Â Â Â Â Â f.close()
> > Â Â Â Â Â Â print 'reading from %s' % src_f
>
> > Â Â Â Â Â Â # replace tags
> > Â Â Â Â Â Â src_data = src_data.replace('<%', '')
> > Â Â Â Â Â Â src_data = src_data.replace('%>', '')
>
> Did you read the script before posting? ;-)
> Here, you remove ASP-style tags. Which is fine, PHP supports them if you
> configure it that way, but you probably didn't. Change this to the start
> and end tags you actually use, and, if you use multiple forms (such as
> <?php vs <?), then add another line or two.
>
>
>
> > Â Â Â Â Â Â print 'replacing php tags'
>
> > Â Â Â Â Â Â # add ID
> > Â Â Â Â Â Â src_data = ( '' % id ) + src_data
> > Â Â Â Â Â Â id += 1
> > Â Â Â Â Â Â print 'adding unique page_id'
>
> > Â Â Â Â Â Â # create new file with .html extension
> > Â Â Â Â Â Â src_file = src_file.replace('.php', '.html')
>
> > Â Â Â Â Â Â # open newly created html file for insertid data
> > Â Â Â Â Â Â dest_f = open(src_f, 'w')
> > Â Â Â Â Â Â dest_f.write(src_data) Â Â Â # write contents
> > Â Â Â Â Â Â dest_f.close()
> > Â Â Â Â Â Â print 'writing to %s' % dest_f

Yes i have read the code very well and by mistake i wrote '<%>'
instead of '<?'

I was so dizzy and confused yesterday that i forgot to metnion that
not only i need removal of php openign and closing tags but whaevers
data lurks inside those tags as well ebcause now with the 'counter.py'
script i wrote the html fiels would open ftm there and substitute the
tempalte variabels like %(counter)d

Also before the

</body>
</html>

of every html file afetr removing the tags this line must be
inserted(this holds the template variable) that 'counter.py' uses to
produce data

<br><br><center><h4><font color=green> ÎÏÎ¹Î¸Î¼ÏÏ ÎÏÎ¹ÏÎºÎµÏÏÏÎ½: %(counter)d
</h4>

After making this modifications then i can trst the script to a COPY
of the original data in my pc.

*In my pc i run Windows 7 while remote web hosting setup uses Linux
Servers.
*That wont be a problem right?

From: Νίκος on 9 Aug 2010 02:31

On 8 ÎÏÎ³, 17:59, Thomas Jollans <tho...(a)jollans.com> wrote:

> Two problems here:
>
> str.replace doesn't use regular expressions. You'll have to use the re
> module to use regexps. (the re.sub function to be precise)
>
> '.' Â matches a single character. Any character, but only one.
> '.*' matches as many characters as possible. This is not what you want,
> since it will match everything between the *first* <? and the *last* ?>.
> You want non-greedy matching.
>
> '.*?' is the same thing, without the greed.

Thanks you,

So i guess this needs to be written as:

src_data = re.sub( '<?(.*?)?>', '', src_data )

Tha 'r' special char doesn't need to be inserter before the regex here
due to regex ain't containing backslashes.

> You will have to find the </body> tag before inserting the string.
> str.find should help -- or you could use str.replace and replace the
> </body> tag with you counter line, plus a new </body>.

Ah yes! Damn why din't i think of it.... str.replace should do the
trick. I was stuck trying to figure regexes.

So, i guess that should work:

src_data = src_data.replace('</body>', '<br><br><h4><font
color=green> ÎÏÎ¹Î¸Î¼ÏÏ ÎÏÎ¹ÏÎºÎµÏÏÏÎ½: %(counter)d </font></h4></body>' )

> No it's not. You're just giving up too soon.

Yes youa re right, your hints keep me going and thank you for that.

From: Νίκος on 9 Aug 2010 03:07

Now the code looks as follows:

=============================
#!/usr/bin/python

import re, os, sys

id = 0 # unique page_id

for currdir, files, dirs in os.walk('test'):

for f in files:

if f.endswith('php'):

# get abs path to filename
src_f = join(currdir, f)

# open php src file
print ( 'reading from %s' % src_f )
f = open(src_f, 'r')
src_data = f.read() # read contents of PHP file
f.close()

# replace tags
print ( 'replacing php tags and contents within' )
src_data = re.sub( '<?(.*?)?>', '', src_data )

# add ID
print ( 'adding unique page_id' )
src_data = ( '' % id ) + src_data
id += 1

# add template variables
print ( 'adding counter template variable' )
src_data = src_data.replace('</body>', '<br><br><center><h4><font
color=green> ÎÏÎ¹Î¸Î¼ÏÏ ÎÏÎ¹ÏÎºÎµÏÏÏÎ½: %(counter)d </body>' )

# rename old php file to new with .html extension
src_file = src_file.replace('.php', '.html')

# open newly created html file for inserting data
print ( 'writing to %s' % dest_f )
dest_f = open(src_f, 'w')
dest_f.write(src_data) # write contents
dest_f.close()

I just tried to test it. I created a folder names 'test' in me 'd:\'
drive.
Then i have put to .php files inside form the original to test if it
would work ok for those too files before acting in the whole copy and
after in the original project.

so i opened a 'cli' form my Win7 and tried

D:\>convert.py

D:\>

Itsjust printed an empty line and nothign else. Why didn't even try to
open the folder and fiels within?
Syntactically it doesnt ghive me an error!
Somehting with os.walk() methos perhaps?

| Next | Last
Pages: 1 2
Prev: Replace and inserting strings within .txt files with the useof regex
Next: I need a starter ptr writing python embedded in html.