by June 2, 2003 0 comments



You may have heard that HTTP, the basis of Web applications, is stateless. A Web server cannot recognize the browser across multiple requests (a request from a browser to server is generated when you type a URL in the location bar or click on a hyperlink in a Web page or submit an HTML form). But still, you are able to log into Yahoo Mail (or any other web based mail service) and during the subsequent requests, you are able to go through your mail. It obviously means that the server recognizes you across requests. 

When you type a URL in the address bar, a request is generated from the browser to the server

Try the following experiment to compound the confusion. Open a browser instance (call this instance parent) and sign into the account. From parent, select the menu option File>New>Window. A new browser instance–let’s call this the child–will be opened. You could use either parent or child to go through your mail account.

Now, open another browser window by double-clicking on the icon from Explorer or Start menu and call this instance the independent. Copy and paste the URL present in either of the parent or child to the address bar of the independent. You’ll see that the mail account is not accessible from independent. Also, once you sign out of either the parent or child, the other doesn’t work.

To comprehend the above behavior, you need to know how a user session is tracked in a Web
application.

Whether the Web application is based on Java Servlets/JSP or MS ASP or any other proprietary technology, the basic concepts of session tracking remain the same.

Why Session Tracking?
Typical client server applications are either connection oriented (a connection between the client and server is alive till the user session is complete) or the client has the capability to maintain user state (User Authentication Information, user specific data).
But, in the case of Web applications, the client (which is a browser) doesn’t have the capability to store user state information. Nor are Web applications connection oriented. When a user types a URL in the address bar or clicks on a hyperlink, a new connection to the server is established, content is retrieved from the server and the connection is closed. When another request is made, a new connection needs to be established.

The above point should highlight a significant issue: There might be hundreds of users signed into the web application simultaneously and the server would maintain the state information for everyone. When a particular browser makes a request to a server, how would the server associate the correct state information for the request? When you click on the Inbox hyperlink in Yahoo Mail’s home page, how is it that you get the list of your mail and not someone else’s? Or consider a still worse situation–how does Yahoo Mail avoid showing your mail to someone else?

How it Works
If you think it is through the browser’s host IP, think again. Then why isn’t Yahoo Mail accessible through the independent? It is for security reasons that this process of associating state information with a user request (called Session Tracking) is not implemented based on host IP.

The Session Management subsystem of the server maintains user session objects for each user interacting with it currently.

The subsystem identifies each of these objects using a unique identifier. Let’s call this unique identifier sid or Session Identifier.

The entire Session Tracking mechanism is based on the server sending the sid to the browser (this is done only once, when the User Session object for that browser instance is created) and the browser sending the sid back to the server along with all the subsequent requests. The Session Management subsystem knows how to extract the sid from the request, and thus associate the correct User Session object with that request. The entire sequence of this Session Tracking can be represented as shown in graphic above.

Sending this sid back and forth between the browser and the server is easier said than done. How exactly does it happen?

There are two important techniques: Cookies and URL Rewrite.

Cookies 
Cookies are small pieces of data that are sent by the server to browser clients. The browser keeps sending the cookie back to the server during all subsequent requests.

Each cookie has a name that identifies it and contains a small payload termed value. When a new User Session is initiated, the server creates a cookie with a standard name (this name may be specific to the server product), sets the value as the User Session’s sid and sends the cookie to the requesting browser. The browser, during all its subsequent requests to the server, sends this cookie along. The server extracts the cookie from the request using the name and thus obtains the sid associated with the user making that request.

A cookie also has an associated lifetime. The cookie will be stored in the client host for its lifetime. Such a cookie will be accessible to all the browser instances opened in that host. Alternatively, by setting the lifetime as zero, the cookie can be set to die if the browser instance that receives it is closed. Such a cookie will be accessible only to the browser instance that receives it and its child instances. Does that ring a bell? Read the behavior of Yahoo! Mail in the first section again. This particular property is significant from a security view point.

The cookie can also have an associated domain. It is the same as that of the host from which the cookie was received in the first place. The significance of this domain is that the browser won’t send this cookie to hosts/servers in other domains. For instance, the cookie received from
mail.yahoo.com and having the domain as yahoo.com will be sent along with requests to
mail.yahoo.com and groups.yahoo.com
but not to msn. hotmail.com. This, again for security reasons, is significant.
When you rip up the actual talk happening between the browser and the server, this is what you would see:
Sample HTTP Response Header — when the Server is sending the cookie to the browser (Note the header that sets the Cookie — standard name of cookie for this server product is
GXHC_gx_session_id_ AppName):

HTTP/1.1 200 OK
Server: Netscape-Enterprise/4.0
Date: Fri, 23 Mar 2003 02:06:48 GMT
Set-Cookie: GXHC_gx_session_id_AppName=511f4caec5a335ee
Content-type: Text/HTML; charset=Shift_JIS
Connection: close 
the actual content

Sample HTTP Request Header — all subsequent requests from the browser to the server (Note the header that sends the Cookie along):

GET /MyApplication/myPage.jsp HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel,
application/msword, application/vnd.ms-powerpoint, */*
Accept-Language: ja
If-Modified-Since: Fri, 23 Mar 2003 01:21:41 GMT
If-None-Match: “0-0-fc5-3abaa525”
User-Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)
Host: 127.0.0.1:8080
Cookie: GXHC_gx_session_id_AppName=511f4caec5a335ee
Connection: Keep-Alive

URL Rewriting 
It is true that Cookies provide an excellent way of sending the sid back and forth between the browser and server. But it is possible that the browsers have disabled the use of cookies (this is a very possible scenario). Most Web applications, including Yahoo Mail, deny the usage of the application if cookies are disabled. But this need not always be a deterrent. There is an alternative–URL Rewriting.

In this case, all content sent by the server to the browser client is rewritten to include sid. This is done in such a way that the sid is sent back to the server during the subsequent request cycles. 

Requests from the rewritten content can originate in two ways: submission of forms and clicks on hyperlinks in the page. To make sure the sid is sent back, hidden fields are added in all forms. The field’s name is a standard as in case of cookies and the value is the sid. And parameters are appended to all hyperlinks–name of the parameter being standard as in case of cookies and value being sid. The rewritten page will contain content like that given below.

And in the case of forms (note the name of hidden field — it is standard)

<input type=“hidden” name= “GXHC_gx_session_id_AppName” value=“511f4caec5a335ee”> 
In the case of hyperlinks (note the name of the parameter — it is standard)
..somepage.jsp?GXHC_gx_session_id_AppName=511f4caec5a335ee

Ganesh Ravindran 
is a Software Architect with Accenture

No Comments so far

Jump into a conversation

No Comments Yet!

You can be the one to start a conversation.

<