Programming an HTML Downloader in C++
Written By: Alwyn Malachi Berkeley
- 25 Sep 2006 -
Description: This program requests the source code (HTML) of a webpage and then gives the user multiple options for viewing the web server's response. We will use the Winsock API to manipulate sockets on the system, and create reusable code along the way.
- Creating the Project
- main.cpp Headers and the Internet Namespace
- The Webpage Class
- The Internet Namespace's Functions
- GetDomain
- Custom_itoa
- GetWebpage
- SaveFile
- Internet.hpp and Writing main.cpp
- Compile and Conclusion
Creating Internet.hpp
So far we have created two files that define an Internet namespace. The two files are "BasicWebpage.hpp" and "InternetHelperFunctions.hpp". Those two names are rather long so for simplicity we will create a header file called "Internet.hpp" and include both of those headers within it. This way we are able to simply include a single easy to remember header in our current and future projects. By now you should know how to create a new source file already so I'll just tell you the source code to put inside of it:
#ifndef INTERNET_HPP #define INTERNET_HPP #include "BasicWebpage.hpp" #include "InternetHelperFunctions.hpp" #endif
Simple eh? That's it! Now we can use the Internet namespace without having to remember each individual header file name.
Finally writing the main.cpp
Now the moment we have been waiting for, we can finally write the logic for our program since our reusable code has been created. You have already written the includes and the main function itself so now lets proceed with what goes inside of main() a couple lines at a time. We start off the main() with:
// welcome the user cout << "\t\tWelcome to the GetHTML Program\n" << endl; // prompt the user for a menu choice int intChoice = 0; cout << "Menu:\n" << "1) Download Webpage\n" << "2) Exit\n" << endl; do { cout << "Please type the number of a choice from above: "; cin >> intChoice; cin.ignore(); if (intChoice != 1 && intChoice != 2) { cout << "Incorrect choice, please try again." << endl; } } while (intChoice != 1 && intChoice != 2);
We start off by simply welcoming the user to the program. Then we print a menu for the user to choose from. The user can either choose to download a webpage by typing 1 or exit the program by typing 2. The user then inputs a number. When the user inputs a number a new line character is left on the buffer so the statement following the input statement uses the ignore() method to remove the new line character floating around. If the choice the user made was not 1 or 2 then an incorrect choice message will display. The user will continue to type in different numbers until they enter either a 1 or 2.
When the action is decided by the user the program then carries out that action through the following code:
// if the user chose to... if (intChoice == 1) { // ... download a webpage // ...then prompt for the webpage url cout << "What is the url of the webpage: "; string strWebpageURL; getline(cin, strWebpageURL); // notify the user that we are processing their request cout << "Retreiving..." << endl; // send a request to that website's server and retreive the webpage auto_ptrtheWebpagePtr; try { theWebpagePtr = GetWebpage(strWebpageURL); } catch (const std::exception &e) { cout << "Error: " << e.what() << endl; abort(); } // notify the user that we have processed their request cout << "Done...\n" << endl;
In most cases the user will want to download a webpage, after all that is what the program is designed to do. It was designed to open so that the user could exit!
When the user chooses choice 1(download webpage) the program prompts the user to enter the URL for a webpage. Then after the user enters the URL the program writes a status message telling the user that it is retrieving the webpage. The status message is important because it could take a few second to download big webpages and you don't want to keep the user guessing during that time. The program will then use the GetWebpage() function to download the URL in question. If the GetWebpage() function throws an exception due to an error the try-catch block will catch the exception, notify the user of the error, and abort the program. In most cases the program will run as planned with no complications and the next part of the program will commence:
// prompt the user for a menu choice intChoice = 0; cout << "What would you like to do with the server's response?\n" << "1) View Header\n" << "2) View HTML\n" << "3) View entire server response\n" << "4) Save the HTML to a file\n" << "5) Nothing\n" << endl; do { cout << "Please type the number of a choice from above: "; cin >> intChoice; cin.ignore(); if (intChoice != 1 && intChoice != 2 && intChoice != 3 && intChoice != 4 && intChoice != 5) { cout << "Incorrect choice, please try again." << endl; } } while (intChoice != 1 && intChoice != 2 && intChoice != 3 && intChoice != 4 && intChoice != 5); // take action based on what the user chose switch (intChoice) { case 1: // view header cout << theWebpagePtr->getHeader() << endl; break; case 2: // view HTML cout << theWebpagePtr->getHTML() << endl; break; case 3: // view entire server response cout << theWebpagePtr->getResponse() << endl; break; case 4: // save the HTML associated with webpage SaveFile("output.html", theWebpagePtr->getHTML()); cout << "The Webpage has been saved under the filename \"output.html\".\n" << endl; break; case 5: // take no action, just add a newline cout << "\n"; break; }
Since the web server's response will have been received we then ask the user how they would like to view the data. We do that by using a menu very similar to the one that was used in the beginning of the program.
Then based on the user's choice we get the information desired and print it to the screen in the case of choices 1, 2, and 3. If the user chooses choice 4 we use the function we defined to save the webpage's HTML. If choice 5 is chosen then just take no action, simply add a new line.
Now for the moment you have been waiting for…the block of code that completes the program:
} else if (intChoice == 2) { // ...exit // ...then return the program here return EXIT_SUCCESS; } cout << "Thank you for using GetHTML!" << endl; system( "PAUSE"); return EXIT_SUCCESS; }
If the user chose choice 2 in the beginning simply end the program. However in either case thank the user for taking the time to use the GetHTML program.
Lastly, pause the program before ending. The pause is really just for testing purposes so that you can observe what has happened throughout the lifetime of the program.