Prev: Need help in python plug-in development
Next: wxPython problem: Can't assign size of plot.PlotCanvas
From: james_027 on 29 Apr 2010 08:03 On Apr 29, 5:31 am, Cameron Simpson <c...(a)zip.com.au> wrote: > On 28Apr2010 22:03, Daniel Fetchinson <fetchin...(a)googlemail.com> wrote: > | > Any idea how I can replace words in a html file? Meaning only the > | > content will get replace while the html tags, javascript, & css are > | > remain untouch. > | > | I'm not sure what you tried and what you haven't but as a first trial > | you might want to > | > | <untested> > | > | f = open( 'new.html', 'w' ) > | f.write( open( 'index.html' ).read( ).replace( 'replace-this', 'with-that' ) ) > | f.close( ) > | > | </untested> > > If 'replace-this' occurs inside the javascript etc or happens to be an > HTML tag name, it will get mangled. The OP didn't want that. > > The only way to get this right is to parse the file, then walk the doc > tree enditing only the text parts. > > The BeautifulSoup module (3rd party, but a single .py file and trivial to > fetch and use, though it has some dependencies) does a good job of this, > coping even with typical not quite right HTML. It gives you a parse > tree you can easily walk, and you can modify it in place and write it > straight back out. > > Cheers, > -- > Cameron Simpson <c...(a)zip.com.au> DoD#743http://www.cskk.ezoshosting.com/cs/ > > The Web site you seek > cannot be located but > endless others exist > - Haiku Error Messageshttp://www.salonmagazine.com/21st/chal/1998/02/10chal2.html Hi all, Thanks for all your input. Cameron Simpson get the idea of what I am trying to do. I've been looking at beautiful soup so far I don't know how to perform search and replace within it. Any suggest good read? Thanks all, James
From: Iain King on 29 Apr 2010 09:55 On Apr 29, 10:38 am, Daniel Fetchinson <fetchin...(a)googlemail.com> wrote: > > | > Any idea how I can replace words in a html file? Meaning only the > > | > content will get replace while the html tags, javascript, & css are > > | > remain untouch. > > | > > | I'm not sure what you tried and what you haven't but as a first trial > > | you might want to > > | > > | <untested> > > | > > | f = open( 'new.html', 'w' ) > > | f.write( open( 'index.html' ).read( ).replace( 'replace-this', 'with-that' > > ) ) > > | f.close( ) > > | > > | </untested> > > > If 'replace-this' occurs inside the javascript etc or happens to be an > > HTML tag name, it will get mangled. The OP didn't want that. > > Correct, that is why I started with "I'm not sure what you tried and > what you haven't but as a first trial you might". For instance if the > OP wants to replace words which he knows are not in javascript and/or > css and he knows that these words are also not in html attribute > names/values, etc, etc, then the above approach would work, in which > case BeautifulSoup is a gigantic overkill. The OP needs to specify > more clearly what he wants, before really useful advice can be given. > > Cheers, > Daniel > Funny, everyone else understood what the OP meant, and useful advice was given.
From: Daniel Fetchinson on 29 Apr 2010 11:46 >> > | > Any idea how I can replace words in a html file? Meaning only the >> > | > content will get replace while the html tags, javascript, & css are >> > | > remain untouch. >> > | >> > | I'm not sure what you tried and what you haven't but as a first trial >> > | you might want to >> > | >> > | <untested> >> > | >> > | f = open( 'new.html', 'w' ) >> > | f.write( open( 'index.html' ).read( ).replace( 'replace-this', >> > 'with-that' >> > ) ) >> > | f.close( ) >> > | >> > | </untested> >> >> > If 'replace-this' occurs inside the javascript etc or happens to be an >> > HTML tag name, it will get mangled. The OP didn't want that. >> >> Correct, that is why I started with "I'm not sure what you tried and >> what you haven't but as a first trial you might". For instance if the >> OP wants to replace words which he knows are not in javascript and/or >> css and he knows that these words are also not in html attribute >> names/values, etc, etc, then the above approach would work, in which >> case BeautifulSoup is a gigantic overkill. The OP needs to specify >> more clearly what he wants, before really useful advice can be given. > > Funny, everyone else understood what the OP meant, and useful advice > was given. It was a lucky day for the OP then! :) Cheers, Daniel -- Psss, psss, put it down! - http://www.cafepress.com/putitdown
From: Cameron Simpson on 29 Apr 2010 18:47 On 29Apr2010 05:03, james_027 <cai.haibin(a)gmail.com> wrote: | On Apr 29, 5:31 am, Cameron Simpson <c...(a)zip.com.au> wrote: | > On 28Apr2010 22:03, Daniel Fetchinson <fetchin...(a)googlemail.com> wrote: | > | > Any idea how I can replace words in a html file? Meaning only the | > | > content will get replace while the html tags, javascript, & css are | > | > remain untouch. [...] | > The only way to get this right is to parse the file, then walk the doc | > tree enditing only the text parts. | > | > The BeautifulSoup module (3rd party, but a single .py file and trivial to | > fetch and use, though it has some dependencies) does a good job of this, | > coping even with typical not quite right HTML. It gives you a parse | > tree you can easily walk, and you can modify it in place and write it | > straight back out. | | Thanks for all your input. Cameron Simpson get the idea of what I am | trying to do. I've been looking at beautiful soup so far I don't know | how to perform search and replace within it. Well the BeautifulSoup web page helped me: http://www.crummy.com/software/BeautifulSoup/documentation.html Here's a function from a script I wrote to bulk edit a web site. I was replacing OBJECT and EMBED nodes with modern versions: def recurse(node): global didmod for O in node.contents: if isinstance(O,Tag): for attr in 'src', 'href': if attr in O: rurl=O[attr] rurlpath=pathwrt(rurl,SRCPATH) if not os.path.exists(rurlpath): print >>sys.stderr, "%s: MISSING: %s" % (SRCPATH, rurlpath,) O2=None if O.name == "object": O2, SUBOBJ = fixmsobj(O) elif O.name == "embed": O2, SUBOBJ = fixembed(O) if O2 is not None: O.replaceWith(O2) SUBOBJ.replaceWith(O) ##print >>sys.stderr, "%s: update: new OBJECT: %s" % (SRCPATH, str(O2), ) didmod=True continue recurse(O) but you have only to change it a little to modify things that aren't Tag objects. The calling end looks like this: with open(SRCPATH) as srcfp: srctext = srcfp.read() SOUP = BeautifulSoup(srctext) didmod = False # icky global set by recurse() recurse(SOUP) if didmod: srctext = str(SOUP) If didmod becomes True we recompute srctext and resave the file (or save it to a copy). Cheers, -- Cameron Simpson <cs(a)zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/ Democracy is the theory that the people know what they want, and deserve to get it good and hard. - H.L. Mencken
From: Stefan Behnel on 30 Apr 2010 01:15 Cameron Simpson, 30.04.2010 00:47: > Here's a function from a script I wrote to bulk edit a web site. I was > replacing OBJECT and EMBED nodes with modern versions: > > def recurse(node): > global didmod > [...] > didmod=True > continue > recurse(O) > > The calling end looks like this: > > SOUP = BeautifulSoup(srctext) > didmod = False # icky global set by recurse() > recurse(SOUP) > if didmod: > srctext = str(SOUP) > > If didmod becomes True we recompute srctext and resave the file (or save it > to a copy). You should rethink your naming in the above code and remove the need for a global variable. Stefan
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 Prev: Need help in python plug-in development Next: wxPython problem: Can't assign size of plot.PlotCanvas |