From: Phil Mcdonnell on 20 May 2010 01:48 I'm trying to scrape a page that hides some data behind a javascript function. Is there any way to get this data? I've been using Mechanize, but I'm not sure it can do this. Is there a better library to use for this type of thing? The following is the interesting part of the page: <td class="colPlus" onclick="fireClick(this,0)"> <a id="iroc_0" class="plus" href="#" onclick="return false;"> </a> </td> -- Posted via http://www.ruby-forum.com/.
From: brabuhr on 20 May 2010 08:49 On Thu, May 20, 2010 at 1:48 AM, Phil Mcdonnell <phil.a.mcdonnell(a)gmail.com> wrote: > I'm trying to scrape a page that hides some data behind a javascript > function. Is there any way to get this data? I've been using > Mechanize, but I'm not sure it can do this. Is there a better library > to use for this type of thing? http://celerity.rubyforge.org/ http://watir.com/ > The following is the interesting part of the page: > > <td class="colPlus" onclick="fireClick(this,0)"> > <a id="iroc_0" class="plus" href="#" onclick="return > false;"> </a> > </td> The *really* interesting part is what does the Javascript do :-) with (a potentially large) effort you may be able to "reverse-engineer" the javascript and emulate manually in mechanize. I.e. if the javascript builds a simple HTTP request, you may be able to send the same request from mechanize (possibly) without much effort.
From: Josh Cheek on 20 May 2010 02:16 [Note: parts of this message were removed to make it a legal post.] On Thu, May 20, 2010 at 12:48 AM, Phil Mcdonnell <phil.a.mcdonnell(a)gmail.com > wrote: > I'm trying to scrape a page that hides some data behind a javascript > function. Is there any way to get this data? I've been using > Mechanize, but I'm not sure it can do this. Is there a better library > to use for this type of thing? > > The following is the interesting part of the page: > > <td class="colPlus" onclick="fireClick(this,0)"> > <a id="iroc_0" class="plus" href="#" onclick="return > false;"> </a> > </td> > -- > Posted via http://www.ruby-forum.com/. > > You might check out Harmony: http://www.rubyinside.com/harmony-javascript-and-a-dom-environment-in-ruby-3001.html http://rubygems.org/gems/harmony http://github.com/mynyml/harmony
From: Steven Parkes on 20 May 2010 13:42 > Mechanize cannot execute javascript but watir/celerity can. (I've never > used harmony) Harmony uses envjs to execute JavaScript. There's also capybara which can either use a browser or envjs.
From: Phil Mcdonnell on 20 May 2010 12:14 The other trick here is that this page is behind a login. Mechanize allows me to fill out the login form and holds onto the login credentials for me. Can harmony/celebrity/watir do this? > > The *really* interesting part is what does the Javascript do :-) with > (a potentially large) effort you may be able to "reverse-engineer" the > javascript and emulate manually in mechanize. I.e. if the javascript > builds a simple HTTP request, you may be able to send the same request > from mechanize (possibly) without much effort. How would one do this? I'm somewhat new to javascript as I usually don't do front end engineering. I see the below definition of this function in the HTML page. Any way I can sniff out what it's actually doing? I'm looking to figure out what the fireClick method displays. <script type="text/javascript"> var d = document.domain.split("."); document.domain = d[d.length - 2] + "." + d[d.length - 1]; var start = (new Date()).getTime(); var fireClick = function(){}; var omn_hierarchy="US|AMEX|Ser|eStatement"; var omn_pagename="MainPage"; var omn_language="en"; var omn_newpagename="yes"; </script> ... way down below... <td class="colPlus" onclick="fireClick(this,0)"> <a id="iroc_0" class="plus" href="#" onclick="return false;"> </a> </td> -- Posted via http://www.ruby-forum.com/.
|
Next
|
Last
Pages: 1 2 3 Prev: XmlSimple not working Next: passing values from partial to controller |