cannot get html content of tag with BeautifulSoup [Python]

Prev: MAKE UPTO $5000 MONTHLY! $2000 INYOUR FIRST 30 DAYS!
Next: using subprocess.Popen does not suppress terminal window on Windows

From: someone on 18 Jun 2010 11:41

Hello,

does anyone know how to get html contents of an tag with
BeautifulSoup? In example I'd like to get all html which is in first
 tag, i.e. This is paragraph one. as
unicode object

p.contents gives me a list which I cannot join TypeError: sequence
item 0: expected string, Tag found

Thanks!

from BeautifulSoup import BeautifulSoup
import re

doc = ['<html><head><title>Page title</title></head>',
'<body>This is
paragraph one.',
'This is paragraph two.',
'</body></html>']
soup = BeautifulSoup(''.join(doc))
#print soup.prettify()
r = re.compile(r'<[^<]*?/?>')
for i, p in enumerate(soup.findAll('p')):
#print type(p) #<class 'BeautifulSoup.Tag'>
#print type(p.contents) #list
content = "".join(p.contents) #fails

p_without_html = r.sub(' ', content)
print p_without_html

From: someone on 18 Jun 2010 12:01

On Jun 18, 5:41 pm, someone <petshm...(a)googlemail.com> wrote:
> Hello,
>
> does anyone know how to get html contents of an tag with
> BeautifulSoup? In example I'd like to get all html which is in first
> tag, i.e. This is paragraph one. as
> unicode object
>
> p.contents gives me a list which I cannot join TypeError: sequence
> item 0: expected string, Tag found
>
> Thanks!
>
> from BeautifulSoup import BeautifulSoup
> import re
>
> doc = ['<html><head><title>Page title</title></head>',
> '<body>This is
> paragraph one.',
> 'This is paragraph two.</
> p>',
> '</body></html>']
> soup = BeautifulSoup(''.join(doc))
> #print soup.prettify()
> r = re.compile(r'<[^<]*?/?>')
> for i, p in enumerate(soup.findAll('p')):
> #print type(p) #<class 'BeautifulSoup.Tag'>
> #print type(p.contents) #list
> content = "".join(p.contents) #fails
>
> p_without_html = r.sub(' ', content)
> print p_without_html

p.renderContents() was what I've looked for

|
Pages: 1
Prev: MAKE UPTO $5000 MONTHLY! $2000 INYOUR FIRST 30 DAYS!
Next: using subprocess.Popen does not suppress terminal window on Windows