Most websites today want to capture and deliver customised information to clients. You might need to store information specific to thousands or even millions of unique customers that visit your site regularly in order to work out their likes and dislikes. This could translate into as many code segments, unique to each customer. To give you an example, we once visited the premises of one of India's largest software companies in Bangalore as part of our annual Best IT Implementations awards' audit process. That company uses Microsoft technologies to deploy and maintain ecommerce applications for its customers spread across the globe. A challenge they regularly faced was in updating code whenever a bug arose which impacted the performance of their applications. Since most of the code they were using was duplicated for customers, it became a mundane task for the team to go through individual code segments and correct the anomaly. They were using tools to reduce the duplication of effort but still were on the lookout for better technologies to resolve this. The latest edition of Microsoft's Visual Studio 2012 comes with a code clone and analysis feature that attempts to address this.
Why do a 'Code Clone Analysis'
Large applications are spread across various projects and divisions within an organisation. In case some of these experience performance issues or require urgent updates or even code refactoring, a developer needs quick access to the code segments where changes are required. Also, as part of software architecture cleanup, it's good to do away with software clones so that the source tree becomes easier for other developers to understand. After completing a new code segment it's a good habit to refactor code into shared classes or methods, and also look up for similar code that exists.
Categorising code clones
This is how MSDN defines code clones:
“Code clones are separate fragments of code that are very similar. They are a common phenomenon in an application that has been under development for some time. Clones make it hard to change your application because you have to find and update more than one fragment. Visual Studio can help you find code clones so that you can refactor them.”
Taking a closer look at this definition, it is very clear that code clones could be found in methods that have been developed using the same logic, different instances of the same object, or even conditional loops like 'if-else', 'do-while', etc. First select a code segment for which you need to find clones for. The tool would suggest clones across the entire code, in varying degrees of similarity and categorises them accordingly. For instance, there could be code segments that are replicas of previous code segments, which are categorised as exact matches. Other code segments could be similar in logic but with a change in name of the method used, such code clones are labelled as strong matches. Yet others might look similar in logic and syntax but represent different functionality, such clones are referred to as medium matches. Others that may be distantly related get categorised as weak matches. You get count of the various clones for the code segment you selected.
The code clones are grouped based on their similarity to the original code. You can click the code segment you selected and also on each group of code clones to find their location in the source code. Apart from finding clones for a selected code segment you also have the option to analyze the complete source code and get a consolidated count of the instances of different code clones.
To compare, open the Code Clone Results window where you can open the two code clones in adjacent windows. The tool should automatically highlight the portions of similar code. This feature uses the same comparison tool as is used for comparing versions under source control. To change settings, open Options from the Tools menu and expand Source Control and Visual Studio Team Foundation Server. Under the Configure User Tools you can add the desired changes.
What's excluded under Code Clone Analysis
While the logic behind finding code clones is pretty much justified, it does make sense to be aware about the obvious exclusions. First, any kind of type declarations are excluded. For instance, two classes with similar set of field declarations shall not be reported. Likewise, statements containing the same variables with different names are not to be compared. Only the statements in methods or functions, conditional statements or definitions are to be compared. Also, while analysing the entire source code, under the Analyse Solution for Code Clones option, small code segments (less than 10 statements long) are not compared. This setting can however be circumvented by applying 'Find matching clones in solution' to shorter code segments. Within a project, you can also define a .codeclonesettings file, which shall contain code elements that shall not be searched if they are put under the Exclusions section. You can also exclude generated code such as the text templates, by naming them in the .codeclonesettings file.
Setting up exclusions
An XML file with a .CODECLONESETTINGS extension is available at the project level. Within this file you can add the elements to be excluded from code clone analysis. The base elements consist of a CodeCloneSettings element with an Exclusions child:
< CodeCloneSettings >
< Exclusions >
...
The elements within this template can consist of, for instance, a directory of text templates called text. So, the code gets modified as under:
You can also exclude namespaces, types and functions. Simply remove the line in the above code starting with