Programmatically Browsing Web Pages...and Downloading Tabs :-)
I've been playing guitar for quite a while and have accumulated quite a few downloaded tabs, but with how accessible they are on the web I usually just end up deleting them and downloading them later. Lately, there's been a lot of legality issues with tab sites, so some of them have been pretty flaky, a lot are getting shut down from time to time, and then I'm unable to download the good tabs from them. Well, there is one site in particular that I use all the time, I'm not going to mention their name here, as I don't want their bandwith or servers to get hammered by everyone doing what I'm doing. But I was thinking what if I could make a program that could just go through every tab page, then programmatically click the download button, intercept the download and put it in a directory of my choosing, once done, go on to the next tab.
Well it has been done, here's how it works. First on my main form I have a text box a start button, and a web browser control. In the textbox you enter the id number of the tab you would like to start with. (all tabs in this database have an id number and they are incremental, very handy, and lucky) When you click the start button, we first browse to the main tab page, with something like this:
string url = string.Format("http://[site]/tablature.php?id={0}", id.ToString());
webBrowser1.Navigate(url);
Then I kick off a timer to check every 500 ms, if the webBrowser has completed loading or not, once it has we kill the check timer and then perform the click of a button on the web page to start the download of the tab as so:
if (webBrowser1.ReadyState == WebBrowserReadyState.Complete) {
// Kill the timer here
ClickDownloadTab();
}
I then had to do some Watch-Debugging, as they did not explicitly name the "Download Tab" button, using the watch window I was able to browse through all input controls and select which one was correct in my case the 3rd index. In order to click the button, we grab the button as a HtmlInputElement and then call the InvokeMember passing in "click" as the event name, this is case-sensitive, and had me fooled, as I first tried "Click" and "Click()".
private void ClickDownloadTab() {
if (webBrowser1.Document != null && webBrowser1.Document.GetElementsByTagName("input").Count > 3) {
_watchingFileDownloads = true;
webBrowser1.Document.GetElementsByTagName("input")[3].InvokeMember("click");
}
}
Now comes the tricky part intercepting the download, so our WebBrowser control (IE) based doesn't automatically show it's download dialog box and screw everything. Conveniently enough, the WebBrowser control has a FileDownload event that we can attach to, unfortunately this event gets called for any download, each page request, etc. This is why I have some flags I set, so I can ignore the page requests we don't want to download. Then I have a downloading flag, so when this gets called again if we're already downloading the file, it stops the browser from browsing anymore, THIS IS WHAT STOPS THE BROWSER FROM SHOWING THE DOWNLOAD FILE DIALOG, here's how it's done:
private bool _watchingFileDownloads = false;
private bool _downloading = false;
private void webBrowser1_FileDownload(object sender, EventArgs e) {
try {
if (_downloading) webBrowser1.Stop();
if (_watchingFileDownloads) {
if (webBrowser1.Document != null) {
HtmlElementCollection metas = webBrowser1.Document.GetElementsByTagName("meta");
if (metas.Count >= 2 && metas[3].OuterHtml.Contains("URL")) {
_watchingFileDownloads = false;
_downloading = true;
string downloadUrl = metas[3].OuterHtml.Substring(43);
downloadUrl = downloadUrl.Remove(downloadUrl.Length - 2);
string fileName = downloadUrl.Substring(downloadUrl.LastIndexOf('/') + 1);
System.Net.WebClient client = new WebClient();
client.DownloadFile(downloadUrl, @"C:\Powertabs\" + fileName);
_downloading = false;
_curId++;
GetPowertab(_curId);
}
}
}
} catch (Exception ex) {
MessageBox.Show(ex.ToString());
}
}
Another thing, I should mention for this particular site, they put the physical URL of the actual Tab file in a meta tag. So, I decided to just parse out the meta tag grab the url and then do a WebClient.DownloadFile(), I then increment my tabId and call GetPowertab again...
Pretty neat considering i've downloaded 4,000 tabs in the last 5/6 hours.