My last 2 posts were about problems with using Roslyn. Nonetheless, even if I sometime hate it, I'm still using it so the time has come to show some practical example of using Roslyn. Recently, I've been working on the task that can be summed up as: Take this ugly code and do something with it. i.e. more or less the refactoring task.
Now I'll give you some intuition of what I have to deal with. The code that I have to refactor was generated automatically based on XML schema. These were actually DTO classes used to communicate with the external service. Here are some statistics:
- 28.7 thousands lines of code in 23 files.
- 2200 classes and 920 enums.
- Many classes and enums seems to me identical or very similar.
public class MyWalker: CSharpSyntaxWalker { public override void VisitClassDeclaration(ClassDeclarationSyntax node) { /* ... */} public override void VisitEnumDeclaration(EnumDeclarationSyntax node) { /* ... */} }
My walker simply visited all classes and enums and saved them for future processing. Additionally classes and enums were grouped by their names. As this stage I detected that they are actually only 620 unique classes and 340 unique enums. The next step was to verify if grouped classes/enums are the same. Two classes are the same if they:
- Have the same name.
- Have the same attributes and these attributes have the same arguments. Here, I decided to ignore some meaningless attributes like GeneratedCodeAttribute.
- Have the same number of properties.
- And these properties have the same names and return types.
- And these properties have the same attributes and these attributes have the same arguments. Here again I ignored some attributes.
If all classes (enums) within a group are the same they can be replaced by only 1 class. However, there is one tricky thing as to the return type. Let's assume that we have 2 properties i.e. Namespace1.Class.Prop and Namsepace2.Class.Prop. Both these properties return the type with the same name i.e. SomeClass. Now, it's important to check if all classes with the name SomeClass are the same according to the above definition. Only then we can merge Namespace1.Class.Prop and Namespace2.Class.Prop. In other words I had a graph, potenially with cycles, of dependencies to analyse.
I modelled this graph in the easy way. I used a dictionary to store groups of classes (enums) where the key was a name of a class or an enum. Additionally, each class/enum has list of related classes/enum i.e. a base class, return types of properties... After these analysis I was able to merge 950 classes and 490 enums.
However, it is not everything. In the next step I used CSharpSyntaxRewriter class to perform the following additional refactorings:
- Remove meaningless attributes.
- Add Enum suffix to enums.
- Remove the backing fields for properties and introduce auto-properties.
- Add using directives and remove fully qualified names of types.
- ...
- 13.5 thousands lines (-53%) of code in 1670 files i.e. 1 class/enum per file.
- 1250 classes (-43%) and 430 enums (-53%).
If you have any question about details of my approach just let me know.
*The picture at the beginning of the post comes from own resources and shows Azuleyo tiles somewhere in Portugal.
2 comments:
Could you show code for MyWalker ?
@Andrei Ignat - The full code of MyWalker is not very short, besides it used some other classes etc. In other words, I think that it's too long to include it here and unfortunately I cannot publish it on GitHub (or somewhere else). However, if you have any specific questions just let me know and I'll try to help.
Post a Comment