Programming an HTML Downloader in C++

Written By: Alwyn Malachi Berkeley

- 25 Sep 2006 -
















Description: This program requests the source code (HTML) of a webpage and then gives the user multiple options for viewing the web server's response. We will use the Winsock API to manipulate sockets on the system, and create reusable code along the way.

  1. Creating the Project
  2. main.cpp Headers and the Internet Namespace
  3. The Webpage Class
  4. The Internet Namespace's Functions
  5. GetDomain
  6. Custom_itoa
  7. GetWebpage
  8. SaveFile
  9. Internet.hpp and Writing main.cpp
  10. Compile and Conclusion

Preface

Before reading this article you should have a basic understanding of object-oriented programming in C++. You should also be familiar with HTML and the HTTP protocol. HTML is a markup language used for editing webpages on the World Wide Web. HTTP protocol is a standard for transmitting information on the World Wide Web. You are probably already familiar with HTML. However, those who are in the dark concerning the http protocol can take a glance at http://en.wikipedia.org/wiki/HTTP to grasp a better understanding.

Introduction

Our goal is to create a little program that can request the source code (HTML) of a webpage and then give the user multiple options for viewing the web server's response. The name of the program is called GetHTML. We will use the Winsock API to manipulate sockets on the system. We will also strive to adhere to proper programming etiquette and create reusable code along the way. It is important that we create reusable code so that we have somewhat of a framework for similar applications in the future. Lastly, this program will be developed using the Dev C++ IDE v5 Beta which is a free open source IDE that can be downloaded from http://www.bloodshed.net/dev/devcpp.html.

Create the Project

So let us begin. Open up Dev C++ and create a new project by clicking File > New > Project… from the menu. The IDE will prompt you for a project type. Click the tab called "Basic" and then click on the "Console Application" icon so that it is highlighted. At the bottom of the screen there should be a box that contains the project name; currently it should read "Project 1". Replace that project name with "prjGetHTML". Next click on the option button entitled "C++ Project" so that the IDE knows you are creating a C++ project and not a C project. Last but not least click the OK button.

A dialog requesting that the prjGetHTML.dev file be saved should appear. That is the project file for your project. Save it in the location of your choice.

Next >>