This time, we see what goes on behind the scenes when you key in a URL in your Web browser, and go on to configure Apache for Linux and Netscape FastTrack for NetWare
When you key in a URL like www.pcquest.com
in a Web browser, you see a message that says ‘Connecting to site www.pcquest.com’ for a few seconds before you see the HTML page of the website. This seems pretty simple, but what happens in the background can be quite complex.
What is MIME? |
The MIME (Multipurpose Internet Mail Extensions) specification is used to identify the type of the contents flowing between a Web server and a browser. The MIME type of the document is specified in the following form.
type/subtype For example, if some text keyed in an HTML form is to be sent by the browser in the request body then its MIME type would be text/plain. Here ‘text’ is the broad category of the content and ‘plain’ indicates the specific subtype–plain text–within that broad category. The Web server also maintains a list of MIME types it can deliver. Like the browser, the Web server indicates the MIME type of the content it will deliver in response to the request by the browser. In case of the request for index.html file, the Web server will send an HTML document and indicate its MIME type as text/html in its header called response header (see below). A Web browser can use the MIME type to determine the type of file it is receiving and thus can open the file itself, launch a plug-in or an external application to open it. Other examples of MIME types are text/richtext, text/vnd.wap.wml (for WML contents), image/gif, image/jpeg, application/zip, and application/octet-stream. |
In the background works a Web server. A Web server works on the client/server principle and provides a specific service of giving out Web pages to clients called Web browsers. So for a website like www. pcquest.com, a Web server would run on a machine, which will have a unique IP address and will directly connect to the Internet. Once started, a Web server runs as a daemon or background application. It listens for any connection request, generally from Web browsers, at port 80, which is a reserved and standard port for Web servers. So besides the IP address, a client application also requires a port number to connect to a Web server. For example, when you type a URL (say
www.pcquest.com), the Web browser finds the corresponding IP address of the machine named
www.pcquest.com from a Domain Name Server (DNS), after which it connects to the Web server at port 80.
TCP/IP is the connection protocol used for the process. It’s a connection-oriented protocol, mea- ning a connection is made upon request, and broken immediately after completing the request. The state of the connection is maintained both by the client and server. Each time a connection is requested by a Web browser, the Web server forks or creates a child process, which serves the Web browser. The Web server then continues to listen to subsequent requests from other Web browsers. This allows a Web server to handle multiple browser requests, a typical scenario on the Web. Next, the request and transfer of Web pages is performed using a protocol called HTTP 1.1 (HyperText Transfer Protocol). HTTP is a stateless protocol, which means that after a request has been processed by the Web server and the result delivered to the Web browser, the HTTP dialog between them ends. It doesn’t keep track of a transaction after it’s over.
Browser’s end
After a connection is established, the browser issues an HTTP method or request in simple text, namely GET, POST, HEAD or PUT, to the Web server. The GET method is used to request a Web page (or any file the Web server can deliver like audio or video). The syntax of GET method is as follows.
GET
For example, after connecting to www.pcquest.com, the browser sends the following text to the Web server to get the main Web page of the website.
GET /index.htm
Dynamic content with CGI |
At www.yahoo.com, if you type in the search keywords ‘java development kit download’ and press the ‘search’ button, the browser displays a Web page with the results of the search. Next, if you search for ‘visual studio .net download’, the browser again displays results, which are very different from the former. This kind of dynamic content is delivered to the browser by the Web server using CGI scripts.
When the browser requests for a file the Web server delivers it from a directory called its ‘document root’. The content of such a file is static. But the Web server can work with CGI scripts to deliver dynamic content to the browser on the fly. A CGI script is usually invoked by the Get or POST method. When the Web server receives a URL to a CGI script, it hands over the execution of the script to an external parser, like Perl, C, Zend PHP parser or Tomcat JSP engine, along with any parameters supplied. The parser processes the script and delivers the result, usually a new HTML page, to the Web server. The Web server in turn delivers the page to the Web browser. |
After an HTTP method requesting a URL is issued, the browser sends some information like its name and version, the language it can understand like English, French, the MIME type MIME specification (see box), and document types it can accept. This information along with the HTTP method is called a request header. The POST method is used when the browser needs to send some additional data, which is not part of the request header. For example, when the URL points to a CGI (Common Gateway Interface) script–a program written in Perl, Java Servlets, Java Server Pages, PHP, or Active Server Pages–the additional data parameters can be passed to these scripts. The POST method is used in HTML forms to send the data typed in the form to a CGI script. The PUT method is used by the Web browser to upload data to a location (directory) on the Web server as specified by the URL. Here, the request header and additional data called the request body are separated by a blank line. In case the browser sends
some contents in the request body, the size of the content and the MIME type is specified in the request header.
Web server’s end
The Web server first sends a number called status code, which indicates whether the requested file can or cannot be delivered (for example, if it doesn’t exist), the location of the file has changed, or some error at the Web server or the Web browser side. You might recall the message ‘HTTP 404 file not found’ when you key in an incorrect URL in the browser. Here the status code 404 is used to tell the browser that the requested file does not exist. In case of no errors, the Web server sends a response header which indicates whether the document should be cached or not by the browser, the language of the contents like English, French, the MIME type of the delivered content, etc. After a blank line, the contents of the requested file are sent. If the browser has requested the file using HEAD method, only the response header is sent and not the file contents. Once the request has been processed, the TCP/IP connection is closed.
Shekhar Govindarajan