Prev: Replace and inserting strings within .txt files with the use of regex
Next: Replace and inserting strings within .txt files with the use of regex
From: MRAB on 9 Aug 2010 16:17 Νίκος wrote: > On 9 Αύγ, 21:05, Thomas Jollans <tho...(a)jollybox.de> wrote: >> On Monday 09 August 2010, it occurred to Νίκος to exclaim: >> >>> On 9 Αύγ, 19:21, Peter Otten <__pete...(a)web.de> wrote: >>>> Νίκος wrote: >>>>> Please tell me that no matter what weird charhs has inside ic an still >>>>> open thosie fiels and make the neccessary replacements. >>>> Go back to 2.6 for the moment and defer learning about unicode until >>>> you're done with the conversion job. >>> You are correct again! 3.2 caused the problem, i switched to 2.7 and >>> now i donyt have that problem anymore. File is openign okey! >>> it ALMOST convert correctly! >>> # replace tags >>> print ( 'replacing php tags and contents within' ) >>> src_data = re.sub( '<\?(.*?)\?>', '', src_data ) >>> it only convert the first instance of php tages and not the rest? >>> But why? >> http://docs.python.org/library/re.html#re.S >> >> You probably need to pass the re.DOTALL flag. > > src_data = re.sub( '<\?(.*?)\?>', '', src_data, re.DOTALL ) > > like this? re.sub doesn't accept a flags argument. You can put the flag inside the regex itself like this: src_data = re.sub(r'(?s)<\?(.*?)\?>', '', src_data) (Note that the abbreviation for re.DOTALL is re.S and the inline flag is '(?s)'. This is for historical reasons! :-))
From: MRAB on 9 Aug 2010 16:28 ����� wrote: > On 9 ���, 10:07, ����� <nikos.the.gr...(a)gmail.com> wrote: >> Now the code looks as follows: >> >> ============================= >> #!/usr/bin/python >> >> import re, os, sys >> >> id = 0 # unique page_id >> >> for currdir, files, dirs in os.walk('test'): >> >> for f in files: >> >> if f.endswith('php'): >> [snip] >> >> I just tried to test it. I created a folder names 'test' in me 'd:\' >> drive. >> Then i have put to .php files inside form the original to test if it >> would work ok for those too files before acting in the whole copy and >> after in the original project. >> >> so i opened a 'cli' form my Win7 and tried >> >> D:\>convert.py >> >> D:\> >> >> Itsjust printed an empty line and nothign else. Why didn't even try to >> open the folder and fiels within? >> Syntactically it doesnt ghive me an error! >> Somehting with os.walk() methos perhaps? > > Can you help in this too please? > > Now iam able to just convrt a single file 'd:\test\index.php' > > But these needs to be done for ALL the php files in every subfolder. > >> for currdir, files, dirs in os.walk('test'): >> >> for f in files: >> >> if f.endswith('php'): > > Should the above lines enter folders and find php files in each folder > so to be edited? I'd start by commenting-out the lines which change the files and then add some more print statements to see which files it's finding. That might give a clue. Only when it's fixed and finding the correct files would I remove the additional print statements and then restore the commented lines.
From: MRAB on 9 Aug 2010 18:32 Νίκος wrote: > On 9 Αύγ, 23:17, MRAB <pyt...(a)mrabarnett.plus.com> wrote: >> Νίκος wrote: >>> On 9 Αύγ, 21:05, Thomas Jollans <tho...(a)jollybox.de> wrote: >>>> On Monday 09 August 2010, it occurred to Νίκος to exclaim: >>>>> On 9 Αύγ, 19:21, Peter Otten <__pete...(a)web.de> wrote: >>>>>> Νίκος wrote: >>>>>>> Please tell me that no matter what weird charhs has inside ic an still >>>>>>> open thosie fiels and make the neccessary replacements. >>>>>> Go back to 2.6 for the moment and defer learning about unicode until >>>>>> you're done with the conversion job. >>>>> You are correct again! 3.2 caused the problem, i switched to 2.7 and >>>>> now i donyt have that problem anymore. File is openign okey! >>>>> it ALMOST convert correctly! >>>>> # replace tags >>>>> print ( 'replacing php tags and contents within' ) >>>>> src_data = re.sub( '<\?(.*?)\?>', '', src_data ) >>>>> it only convert the first instance of php tages and not the rest? >>>>> But why? >>>> http://docs.python.org/library/re.html#re.S >>>> You probably need to pass the re.DOTALL flag. >>> src_data = re.sub( '<\?(.*?)\?>', '', src_data, re.DOTALL ) >>> like this? >> re.sub doesn't accept a flags argument. You can put the flag inside the >> regex itself like this: >> >> src_data = re.sub(r'(?s)<\?(.*?)\?>', '', src_data) >> >> (Note that the abbreviation for re.DOTALL is re.S and the inline flag is >> '(?s)'. This is for historical reasons! :-)) > > This is for the '.' to match any character including '\n' too right? > so no matter if the php start tag and the end tag is in different > lines still to be matched, correct? > > We nned the 'raw' string as well? why? The regex doens't cotnain > backslashes. Yes it does; two of them!
From: MRAB on 9 Aug 2010 18:43 ����� wrote: > D:\>convert.py > File "D:\convert.py", line 34 > SyntaxError: Non-ASCII character '\xce' in file D:\convert.py on line > 34, but no > encoding declared; see http://www.python.org/peps/pep-0263.html for > details > > D:\> > > What does it refering too? what character cannot be identified? > > Line 34 is: > > src_data = src_data.replace( '</body>', '<br><br><center><h4><font > color=green> ������� ����������: %(counter)d </body>' ) > Didn't you say that you're using Python 2.7 now? The default file encoding will be ASCII, but your file isn't ASCII, it contains Greek letters. Add the encoding line: # -*- coding: utf-8 -*- and check that the file is saved as UTF-8. > Also, > > for currdir, files, dirs in os.walk('test'): > > for f in files: > > if f.lower().endswith("php"): > > in the above lines > > should i state os.walk('test') or os.walk('d:\test') ? The path 'test' is relative to the current working directory. Is that D:\ for your script? If not, then it won't find the (correct) folder. It might be better to use an absolute path instead. You could use either: r'd:\test' (note that I've made it a raw string because it contains a backslash which I want treated as a literal backslash) or: 'd:/test' (Windows should accept a slash as well as of a backslash.)
From: MRAB on 10 Aug 2010 11:12
Νίκος wrote: [snip] > > The ID number of each php page was contained in the old php code > within this string > > PageID = some_number > > So instead of create a new ID number for eaqch page i have to pull out > this number to store to the beginnign to the file as comment line, > because it has direct relationship with the mysql database as in > tracking the number of each webpage and finding the counter of it. > > # Grab the PageID contained within the php code and store it in id > variable > id = re.search( 'PageID = ', src_data ) > > How to tell Python to Grab that number after 'PageID = ' string and to > store it in var id that a later use in the program? > If the part of the file you're trying to match look like this: PageID = 12 then the regex should look like this: PageID = (\d+) and the code should look like this: page_id = re.search(r'PageID = (\d+)', src_data).group(1) The page_id will, of course, be a string. > also i made another changewould something like this work: > > =============================== > # open same php file for storing modified data > print ( 'writing to %s' % dest_f ) > f = open(src_f, 'w') > f.write(src_data) > f.close() > > # rename edited .php file to .html extension > dst_f = src_f.replace('.php', '.html') > os.rename( src_f, dst_f ) > =============================== > > Because instead of creating a new .html file and inserting the desired > data of the old php thus having two files(old php, and new html) i > decided to open the same php file for writing that data and then > rename it to html. > Would the above code work? Why wouldn't it? |