Programming in any language requires a clear understanding of the fundamental data types made available to a programmer. Data types define the kind of data with which a programmer can work–that is, the nature of data like integers, floating point values, or characters; and its range. It’s the range that, in turn, defines the minimum and maximum values of the data type.
Here we’ll take a look at the .Net type system, called CTS (Common Type System), which forms the basis of the data types in all .Net languages, including C#. We’ll then see data types, their divisions and, finally, how to convert one kind to another.
Hello CTS
In the .Net article last month (See Understanding .Net, page 162, PCQuest July 2001), we saw how CLS (Common Language Specification) provides a foundation, which when adhered to by compiler vendors, allows them to write compilers compliant with the CLR (Common Language Runtime), and hence, be interoperable with other compliant compilers. This interoperability stems from the fact that CLS defines a set of data types, which all compliant compilers must provide. This set of data types forms the .Net CTS (Common Type System), which is a part of the
CLS.
The CTS makes available a common set of data types so that compiled code of one language could easily interoperate with compiled code of another language by understanding each others’ data types. But why do we need a common type system in the first place?
In traditional, object-oriented languages, developers had two kinds of data types at their disposal: primitive types, defined by the language and built into it, like int, float, double, or char, and user defined types, which included classes. The problem was that these two types were, and still are, incompatible with each other. For example, say you have a float variable, and an object of some class, say CFoo. While CFoo may have some methods associated with it to work on it, the same isn’t true for the built-in float type. Thus, if you were asked to write a piece of code that could work, without any change, identically on the two data types, you couldn’t. Hence, you were stuck until you wrote some wrapper classes for the primitive types, and then made it your programming habit to declare the variable that required the use of a primitive type, to use the corresponding wrapper class.
This problem has been taken care of by the .Net CTS. All data types are objects in nature under CTS, and more importantly, they all derive from a common class, System.Object. Hence, all CTS data types derive from a common, most generic data type, called object. And since they all derive from one class, they share some common functionality, and one type can easily be converted to another.
However, creating everything as an object has a disadvantage in the form-performance degradation. Suppose, we were to add two integers. However, from our knowledge of C#, we know that even an integer is an object. Thus, the simple operation of adding two integers would require allocation of memory on heap (area of free memory available to all processes) for the integer object. To tackle this, and make things more efficient, the CTS divides the available data types into two categories.
Value and reference types
The CTS data types are categorized as either value types or reference types, depending on how they are created in the memory.
The value types are constituted of the following:
- Simple type, like integers, floats, doubles, char, byte, short, long, bool
- Structures
- Enums
Likewise, the reference types include the following:
- Classes
- Interfaces (new to C#)
- Delegates (new to C#)
- Arrays
- Objects
- Strings
To understand how the two categories differ in their creation, suppose, we perform the following assignment:
int a = 2;
Since a is a variable of the type int, which happens to be a value, we end up allocating space on the stack for the value type variable, and the assigned value, that is, 2 is stored there. Likewise, if we perform the following assignment,
int b = a;
we allocate another space on the stack for the variable b, storing the value 2 there. Thus, both memory locations, corresponding to the value type variables a and b contain the value 2. This boils down to the fact that all value types contain some data.
Kinds of data types |
||
Type Name |
Known in C# |
Kind Mother of |
System.String | string | String |
System.Sbyte | sbyte | Signed 8-bit byte |
System.Byte | byte | Unsigned 8-bit byte |
System.Int16 | short | Signed 16-bit integer |
System.UInt16 | ushort | Unsigned 16-bit integer |
System.Int32 | int | Signed 32-bit integer |
System.UInt32 | uint | Unsigned 32-bit integer |
System.Int64 | long | Signed 64-bit integer |
System.UInt64 | ulong | Unsigned 64-bit integer |
System.Char | char | 16-bit unicode characters |
System.Boolean | bool | Boolean value (true/false) |
System.Single | single | 32-bit float number |
System.Double | double | 64-bit float number |
System.Decimal | decimal | 128-bit number for financial applications |
Reference type allocations work differently. For instance, in the following assignment,
string s = “Nannu misses me….. but I don’t!”;
instead of the stack, memory is allocated from the heap. The assigned string is stored there and the memory address is stored in the reference type variable s. Thus, s doesn’t contain the string, but it points, or refers, to the memory location which contains the assigned
string.
Thus, even though all data types in the CTS are objects in nature and derive from the common System.Object class, the way they are worked upon by the CLR is different. And it’s these different ways of creation that make the CTS efficient, even though everything’s an object. The table below gives you a brief introduction to the various data types kinds.
We now introduce data types greater than 32-bit in range. The best part is that these data type ranges remain fixed, irrespective of the system on which the application using them is run. Unlike traditional development environments where the data type range was dependent on the underlying microprocessor, in CTS, the data type ranges are dependent on the CLR. So, it is the job of the CLR to make sure that the data type ranges remain fixed in the ranges shown above, irrespective of the underlying microprocessor.
So, when we write a statement as below in C#,
int a = 2;
what we are actually doing is telling the compiler that a is an object of the System.Int32 class. int is just an alias for the
System.Int32 class.
Now that we’ve gone through the basics of data types, let’s move onto boxing (of data types).
CTS conversions
In .Net, and consequently in C#, the process of converting a value type to a reference type is termed as boxing. And
vice-versa, the process of converting a reference type to value type is termed as unboxing. Let’s take an example.
int a = 2;
object oa = a;
Here, the first line creates a value type variable a on the stack and assigns the value 2 to the memory location. The second line performs a boxing operation automatically, by first creating an object oa on the heap, and then allocating to it the value of the value type variable, a. The important point here is that the two values are independent of each other. The following lines of code illustrate the concept.
int joke = 2;
object ojoke = joke;
ojoke = 3;
Console.WriteLine(“Joke={0}, oJoke={1}”,joke,ojoke);
When the value of ojoke is changed, the change isn’t reflected in joke. This is the consequence of the way value and reference types are created by the
CLR.
Moving onto unboxing, one notable difference from boxing is that during unboxing, we have to specify the type being unboxed to, and hence, it’s an explicit operation. Consequently, C# first verifies that the type being requested is actually stored in the reference type, as in the following example.
int joke1 = 2;
object ojoke = joke1;
int joke2 = (int)ojoke;
In this case, after the boxing operation in line 2 is performed, when unboxing is attempted in line 3, CLR first ensures that the requested type (in this case int) is actually present in the reference type. Since it is, because an integer was boxed into ojoke, the unboxing operation succeeds. Had some other type been requested, like decimal, the unboxing operation would have failed, and an exception would have been raised.
Finally
By now, you should have a fair idea about the data types in use by C#, and in fact, by all .Net languages because they are part of the implementation of the CTS. CTS ensures easy interoperability between all .Net languages. So the learning curve is minimized, and behavior arising out of the operations on the data types is common across all environments. A single root-class hierarchy helps in data-type interoperability, and CTS makes sure that every reference to an object is typed, that is, its data type is known, and that the type referenced is valid in the context of the operation, as shown during the explanation of
unboxing.
Kumar Gaurav Khanna runs www.wintools.f2s.com