scraping from bundes-telefonbuch.de with python [Python]

Prev: subprocess in Command promt+ webbrowser
Next: MAKE UPTO $5000 PER MONTH! $2000 IN FIRST 30 DAYS!

From: davidgp on 19 Jun 2010 06:23

hello, i'm new on this group, and quiet new to python!
i'm trying to scrap some adress data from bundes-telefonbuch.de but i
run into a problem:
the link is like this: http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20
and it is basically the same for every search query.
thus i need to submit post data to the webserver, i try to do this
like this:

opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0 (compatible;
Konqueror/3.5; Linux) KHTML/3.5.4 (like Gecko)')]
urllib2.install_opener(opener)

data = urllib.urlencode({'F0': 'mySearchKeyword','B': 'T','F8': 'A ||
G','W': '1','Z': '0','HA': '10','SAS_static_0_treffer_treffer': 'Suche
starten','S': '1','translationtemplate': 'checkstrasse'})

url = 'http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20'
response = urllib2.urlopen(url, data)

this returns a page saying i have to reenter my search terms..
what's going wrong here?

Thanks!!

From: Rebelo on 19 Jun 2010 08:16

On 19 lip, 12:23, davidgp <davidvanijzendo...(a)gmail.com> wrote:
> hello, i'm new on this group, and quiet new to python!
> i'm trying to scrap some adress data from bundes-telefonbuch.de but i
> run into a problem:
> the link is like this:http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20
> and it is basically the same for every search query.
> thus i need to submit post data to the webserver, i try to do this
> like this:
>
> opener = urllib2.build_opener()
> opener.addheaders = [('User-Agent', 'Mozilla/5.0 (compatible;
> Konqueror/3.5; Linux) KHTML/3.5.4 (like Gecko)')]
> urllib2.install_opener(opener)
>
> data = urllib.urlencode({'F0': 'mySearchKeyword','B': 'T','F8': 'A ||
> G','W': '1','Z': '0','HA': '10','SAS_static_0_treffer_treffer': 'Suche
> starten','S': '1','translationtemplate': 'checkstrasse'})
>
> url = 'http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20'
> response = urllib2.urlopen(url, data)
>
> this returns a page saying i have to reenter my search terms..
> what's going wrong here?
>
> Thanks!!

Try mechanize : http://wwwsearch.sourceforge.net/mechanize/

import mechanize
response = mechanize.urlopen("http://www.bundes-telefonbuch.de/")
forms = mechanize.ParseResponse(response, backwards_compat=False)
form = forms[0]
form["F0"] = "query" #enter query
html = mechanize.urlopen(form.click()).read()
f = open("tmp.html","w")
f.writelines(html)
f.close()

Or you can try to parse response but I think that their HTML is not
valid

From: Michael Torrie on 19 Jun 2010 11:02

On 06/19/2010 04:23 AM, davidgp wrote:
> opener = urllib2.build_opener()
> opener.addheaders = [('User-Agent', 'Mozilla/5.0 (compatible;
> Konqueror/3.5; Linux) KHTML/3.5.4 (like Gecko)')]
> urllib2.install_opener(opener)
>
> data = urllib.urlencode({'F0': 'mySearchKeyword','B': 'T','F8': 'A ||
> G','W': '1','Z': '0','HA': '10','SAS_static_0_treffer_treffer': 'Suche
> starten','S': '1','translationtemplate': 'checkstrasse'})
>
> url = 'http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20'
> response = urllib2.urlopen(url, data)
>
> this returns a page saying i have to reenter my search terms..
> what's going wrong here?

Most likely you need a cookie. You'll probably have to set up a cookie
store for use with urllib2, then request the page that the search form
is on so that the cookie is generated, and then make your post with your
search terms.

|
Pages: 1
Prev: subprocess in Command promt+ webbrowser
Next: MAKE UPTO $5000 PER MONTH! $2000 IN FIRST 30 DAYS!