» Duplication » Managed Chaos by Naresh Jain

Duplicate Code and Ceremony in Java

Thursday, July 21st, 2011

How would you kill this duplication in a strongly typed, static language like Java?

private int calculateAveragePreviousPercentageComplete() {
    int result = 0;
    for (StudentActivityByAlbum activity : activities)
        result += activity.getPreviousPercentageCompleted();
    return result / activities.size();
}
 
private int calculateAverageCurrentPercentageComplete() {
    int result = 0;
    for (StudentActivityByAlbum activity : activities)
        result += activity.getPercentageCompleted();
    return result / activities.size();
}
 
private int calculateAverageProgressPercentage() {
    int result = 0;
    for (StudentActivityByAlbum activity : activities)
        result += activity.getProgressPercentage();
    return result / activities.size();
}

Here is my horrible solution:

private int calculateAveragePreviousPercentageComplete() {
    return new Average(activities) {
        public int value(StudentActivityByAlbum activity) {
            return activity.getPreviousPercentageCompleted();
        }
    }.result;
}
 
private int calculateAverageCurrentPercentageComplete() {
    return new Average(activities) {
        public int value(StudentActivityByAlbum activity) {
            return activity.getPercentageCompleted();
        }
    }.result;
}
 
private int calculateAverageProgressPercentage() {
    return new Average(activities) {
        public int value(StudentActivityByAlbum activity) {
            return activity.getProgressPercentage();
        }
    }.result;
}
 
private static abstract class Average {
    public int result;
 
    public Average(List<StudentActivityByAlbum> activities) {
        int total = 0;
        for (StudentActivityByAlbum activity : activities)
            total += value(activity);
        result = total / activities.size();
    }
 
    protected abstract int value(StudentActivityByAlbum activity);
}

if this were Ruby

@activities.inject(0.0){ |total, activity| total + activity.previous_percentage_completed? } / @activities.size
@activities.inject(0.0){ |total, activity| total + activity.percentage_completed? } / @activities.size
@activities.inject(0.0){ |total, activity| total + activity.progress_percentage? } / @activities.size

or even something more kewler

average_of :previous_percentage_completed?
average_of :percentage_completed?
average_of :progress_percentage?
 
def average_of(message)
	@activities.inject(0.0){ |total, activity| total + activity.send message } / @activities.size
end

Posted in Agile, Code Smells, Java, Programming, Programming Languages | 11 Comments »

Levels of Duplication

Wednesday, October 21st, 2009

Starting with the obvious forms of duplication like Cltr+C & Cltr+V pattern to more subtle forms of duplication:

Literal Duplication. Ex: Same for loop in 2 places
Semantic Duplication: In essence the code does the same thing, but is syntactically different. Again there are sub-levels:
- 1st Level: Ex: for and foreach loop
  for(int i : someList) stack.push(i);
  for(int i : someList) stack.push(i);
  v/s
  for(int i=0; i < someList.size(); i++) stack.push(someList.get(i));
  for(int i=0; i < someList.size(); i++) stack.push(someList.get(i));
- 2nd Level: Ex: Looping over an array of elements instead of each element in a different line
  stack.push(1); stack.push(3); stack.push(5); stack.push(10); stack.push(15);
  stack.push(1); stack.push(3); stack.push(5); stack.push(10); stack.push(15);
  v/s
  for(int i : asList(1,3,5,10,15)) stack.push(i);
  for(int i : asList(1,3,5,10,15)) stack.push(i);
- 3rd Level: Ex: Loop v/s Recursion
Data Duplication. Ex: Some constant declared in 2 classes (test and production)
Structural Duplication: Ex: Parallel Inheritance Hierarchy
Conceptual Duplication: Ex: 2 Algos to Sort elements (Bubble sort and Quick sort)
Representational Knowledge Duplication: Commonly know at WET (violation of DRY – Don’t Repeat Yourself)
Duplication of logical steps: Same set of steps repeat in different scenarios. Ex: Same set of validations in various points in your applications

Duplication of statement fragments: Same sections of a statement repeating. Ex:

 Assert.IsTrue(response.HasHeader);
Assert.IsTrue(response.HasMessageId);
Assert.IsTrue(response.Has("X-SenderIP: " + senderIp));
Assert.IsTrue(response.Has("X-SenderDomain: " + senderDomain));
Assert.IsTrue(response.Has("X-recipientDomain: " + recipientDomain));
Assert.IsTrue(response.Has("X-SPF: " + spfValue));
Assert.IsTrue(response.Has("X-1stClassification: " + firstClassificationResult));
Assert.IsTrue(response.Has("X-2ndClassification: " + secondClassificationResult));
Assert.IsTrue(response.Has("X-3rdClassification: " + thirdClassificationResult));
Assert.IsTrue(response.Has("X-MANUALLY-CLASSIFIED: " + manuallyClassified));

Once we clean this up, it might look like:

 lets.checkThat(response).HasHeader.HasMessageId.Has + "X-SenderIP" = senderIp + "X-SenderDomain" = senderDomain
        + "X-recipientDomain" = recipientDomain + "X-SPF" = spfValue + "X-1stClassification" = firstClassificationResult
        + "X-2ndClassification" = secondClassificationResult + "X-3rdClassification" = thirdClassificationResult + "X-MANUALLY-CLASSIFIED" = manuallyClassified;

Thanks to Corey Haines and the folks who participated in the Biggest Stinkers session @ the Simple Design and Testing Conference 2009. Most of this information was discussed during that session.

Posted in Design | No Comments »

An Example of Primitive Obsession

Tuesday, October 20th, 2009

Couple of years ago, I was working with a team which was building an application for handling User profile. It had the following functionality

User Signup
User Profile
Login/Logout API
Authentication and Authorization API
and so on…

As you can see, this is pretty common to most applications.

The central entity in this application is User. And we had a UserService to expose its API. So far, so good.

The UserService had 2 main methods that I want to focus on. The first one is:

    public boolean authenticate(String userId, String password) {
        ...
    }

Even before the authenticate() method gets called, the calling class does basic validations on the password parameter. Stuff like

the password cannot be null,
it needs to be more than 6 char long and less than 30 char long
should contain at least one special char or upper case letter
should contain at least one letter
and so on …

Some of these checks happen to reside as separate methods on a PasswordUtil class and some on the StringUtil class.

Once the authenticate method is called, we retrieve the respective User from the database, fetch the password stored in the database and match the new password against it. Wait a sec, we don’t store plain password in the DB any more, we hash them before we store ’em. And as you might already know, we use one-way hash; which means given a hash, we cannot get back the original string. So we hash the newly entered password. For which we use a HashUtil class. Then we compare the 2 hashes.

The second method is:

    public User create(final UserDTO userDTO) {
       ...
    }

Before the create() method is called, we validate all the fields inside the UserDTO. During this validation, we do the exact same validations on password as we do before the authenticate method. If all the fields are valid, then inside the create method, we make sure no one else has the same userid. Then we take the raw text password and hash it, so that we can store it in our DB. Once we save the user data in DB, we send out an activation email and off we are.

Sorry that was long. What is the point? Exactly my point. What is the point. Why do I need to know all this stuff? I can’t really explain you the pain I go through when I see:

All these hops & jumps around these large meaningless classes (UserDTO, PasswordUtil, StringUtil, HashUtil)
Conceptual and Data duplication in multiple places
Difficulty in knowing where I can find some logic (password logic seems to be sprayed all over the place)
And so on …

This is an example of Primitive Obsession.

A huge amount of complexity can be reduced,
clarity can be increased and
duplication can be avoided in this code

If we can create a Password class. To think about it, Password is really an entity like User in this domain.

Password class’ constructor can do the validations for you.
You can give it another password and ask if they match. This will hide all the hashing and rehashing logic from you
You can kill all those 3 Utils classes (PasswordUtil, StringUtil, HashUtil) & move the logic in the Password class where it belong

So once we are done, we have the following method signatures:

    public User userWithMatching(UserId id, Password userEnteredPwd) {
        ...
    }

    public User create(final User newUser) {
       ...
    }

Posted in Design | No Comments »

Everything else is just Noise

Tuesday, September 22nd, 2009

Recently I was working on some code. The code was trying to tell me many things, but I was not sure if I was understanding what it was trying to communicate. It just felt irrelevant or noise at that moment. Somehow the right level of abstraction was missing.

When I started I had:

private final UserService userService = createMock(UserService.class);
private final DomainNameService dns = createMock(DomainNameService.class);
private final RandomNumberGenerator randomNumberGenerator = new RandomNumberGenerator() {
    @Override
    public long next() {
        return 9876543210L;
    }
};
private final IdentityGenerator identityGenerator = new IdentityGenerator(randomNumberGenerator, dns, userService);
private final User naresh_from_mumbai = new User("naresh", "jain", "mumbai", "india", "indian");

@Test
public void avoidRestrictedWordsInIds() {
    expect(dns.isCelebrityName("naresh", "jain")).andStubReturn(false);
    expect(dns.validateFirstPartAndReturnRestrictedWordIfAny("naresh")).andStubReturn("naresh");
 
    expect(dns.isCelebrityName("nares", "jain")).andStubReturn(false);
    expect(dns.validateFirstPartAndReturnRestrictedWordIfAny("nares")).andStubReturn(null);
    expect(dns.validateSecondPartAndReturnRestrictedWordIfAny("jain")).andStubReturn(null);
    expect(userService.isIdentityAvailable("[email protected]")).andStubReturn(true);
 
    expect(dns.isCelebrityName("nares", "india")).andStubReturn(false);
    expect(dns.isCelebrityName("naresh", "india")).andStubReturn(false);
    expect(dns.validateFirstPartAndReturnRestrictedWordIfAny("nares")).andStubReturn(null);
    expect(dns.validateSecondPartAndReturnRestrictedWordIfAny("india")).andStubReturn(null);
    expect(userService.isIdentityAvailable("[email protected]")).andStubReturn(true);
 
    expect(dns.isCelebrityName("nares", "indian")).andStubReturn(false);
    expect(dns.isCelebrityName("naresh", "indian")).andStubReturn(false);
    expect(dns.validateFirstPartAndReturnRestrictedWordIfAny("nares")).andStubReturn(null);
    expect(dns.validateSecondPartAndReturnRestrictedWordIfAny("indian")).andStubReturn(null);
    expect(userService.isIdentityAvailable("[email protected]")).andStubReturn(true);
 
    expect(dns.isCelebrityName("nares", "mumbai")).andStubReturn(false);
    expect(dns.isCelebrityName("naresh", "mumbai")).andStubReturn(false);
    expect(dns.validateFirstPartAndReturnRestrictedWordIfAny("nares")).andStubReturn(null);
    expect(dns.validateSecondPartAndReturnRestrictedWordIfAny("mumbai")).andStubReturn(null);
    expect(userService.isIdentityAvailable("[email protected]")).andStubReturn(true);
 
    replay(userService, dns);
 
    List<String> generatedIDs = identityGenerator.getGeneratedIDs(naresh_from_mumbai);
    List<String> expectedIds = ids("[email protected]", "[email protected]", "[email protected]", "[email protected]");
 
    assertEquals(expectedIds, generatedIDs);
 
    verify(userService, dns);
}

@Test public void avoidRestrictedWordsInIds() { expect(dns.isCelebrityName("naresh", "jain")).andStubReturn(false); expect(dns.validateFirstPartAndReturnRestrictedWordIfAny("naresh")).andStubReturn("naresh"); expect(dns.isCelebrityName("nares", "jain")).andStubReturn(false); expect(dns.validateFirstPartAndReturnRestrictedWordIfAny("nares")).andStubReturn(null); expect(dns.validateSecondPartAndReturnRestrictedWordIfAny("jain")).andStubReturn(null); expect(userService.isIdentityAvailable("[email protected]")).andStubReturn(true); expect(dns.isCelebrityName("nares", "india")).andStubReturn(false); expect(dns.isCelebrityName("naresh", "india")).andStubReturn(false); expect(dns.validateFirstPartAndReturnRestrictedWordIfAny("nares")).andStubReturn(null); expect(dns.validateSecondPartAndReturnRestrictedWordIfAny("india")).andStubReturn(null); expect(userService.isIdentityAvailable("[email protected]")).andStubReturn(true); expect(dns.isCelebrityName("nares", "indian")).andStubReturn(false); expect(dns.isCelebrityName("naresh", "indian")).andStubReturn(false); expect(dns.validateFirstPartAndReturnRestrictedWordIfAny("nares")).andStubReturn(null); expect(dns.validateSecondPartAndReturnRestrictedWordIfAny("indian")).andStubReturn(null); expect(userService.isIdentityAvailable("[email protected]")).andStubReturn(true); expect(dns.isCelebrityName("nares", "mumbai")).andStubReturn(false); expect(dns.isCelebrityName("naresh", "mumbai")).andStubReturn(false); expect(dns.validateFirstPartAndReturnRestrictedWordIfAny("nares")).andStubReturn(null); expect(dns.validateSecondPartAndReturnRestrictedWordIfAny("mumbai")).andStubReturn(null); expect(userService.isIdentityAvailable("[email protected]")).andStubReturn(true); replay(userService, dns); List<String> generatedIDs = identityGenerator.getGeneratedIDs(naresh_from_mumbai); List<String> expectedIds = ids("[email protected]", "[email protected]", "[email protected]", "[email protected]"); assertEquals(expectedIds, generatedIDs); verify(userService, dns); }

As you can see, my first reaction after looking at this code was that there is too much going on, most of which is duplicate. So cleaned it up a bit and made it more expressive by

private final Context lets = new Context(userService, dns);
 
@Test
public void avoidRestrictedWordsInIds() {
    lets.assume("naresh").plus("jain").isNotACelebrityName();
    lets.assume("naresh").isARestrictedUserName();
 
    lets.assume("nares").plus("jain").isNotACelebrityName();
    lets.assume("nares").isNotARestrictedUserName();
    lets.assume("jain").isNotARestrictedDomainName();
    lets.assume().identity("[email protected]").isAvailable();
 
    lets.assume("nares").plus("india").isNotACelebrityName();
    lets.assume("nares").isNotARestrictedUserName();
    lets.assume("india").isNotARestrictedDomainName();
    lets.assume().identity("[email protected]").isAvailable();
 
    lets.assume("nares").plus("indian").isNotACelebrityName();
    lets.assume("nares").isNotARestrictedUserName();
    lets.assume("indian").isNotARestrictedDomainName();
    lets.assume().identity("[email protected]").isAvailable();
 
    lets.assume("nares").plus("mumbai").isNotACelebrityName();
    lets.assume("nares").isNotARestrictedUserName();
    lets.assume("mumbai").isNotARestrictedDomainName();
    lets.assume().identity("[email protected]").isAvailable();
 
    List<String> generatedIds = suggester.generateIdsFor(naresh_from_mumbai);
 
    lets.assertThat(generatedIds).are("[email protected]", "[email protected]", "[email protected]", "[email protected]");
}

By introducing a new class called Context and moving all the mocking code into that, my test looked lot more clear. I was also able to create an abstraction that could communicate intent much more easily.

Next I reduced the clutter further by creating another level of abstraction as follows

@Test
public void avoidRestrictedWordsInIds() {
    lets.assume("naresh", "jain").isNotACelebrityName();
    lets.assume("naresh").isARestrictedUserName();
 
    for (final String[] identityTokens : list(_("nares", "jain"), _("nares", "india"), _("nares", "indian"), _("nares", "mumbai"))) {
        lets.assume(identityTokens[0], identityTokens[1]).isNotACelebrityName();
        lets.assume(identityTokens[0]).isNotARestrictedUserName();
        lets.assume(identityTokens[1]).isNotARestrictedDomainName();
        lets.assume().identity(identityTokens[0] + "@" + identityTokens[1] + ".com").isAvailable();
    }
 
    List<String> generatedIds = suggester.generateIdsFor(naresh_from_mumbai);
 
    lets.assertThat(generatedIds).are("[email protected]", "[email protected]", "[email protected]", "[email protected]");
}

But at this point, even though the code ended up being very dense, it was very difficult to understand what was going on and why so. In a desperate search for simplicity and better communication, I ended up with

@Test
public void avoidRestrictedWordsInIds() {
    lets.assume("naresh").isARestrictedUserName();
    List<String> generatedIds = suggester.suggestIdsFor(naresh_from_mumbai);
    lets.assertThat(generatedIds).are("[email protected]", "[email protected]", "[email protected]", "[email protected]");
}

What is interesting about this is that I made some simple assumption saying:

every name is not a celebrity name unless specified
every user name is a valid (non-restricted) user name unless specified
every domain name is a valid (non-restricted) domain name unless specified
every identity is available unless specified

All these assumptions are now capture in my Context object and rest of my tests can happily focus on what really matters. I really liked the way this reduced the clutter in my tests without compromising on communication.

Posted in Design, Programming, Testing | No Comments »

Eradicate Duplication; Embrace Communication

Friday, March 13th, 2009

Yesterday, I spent some time cleaning up Acceptance Tests on a project which exposes some REST APIs.

Following is a snippet of one of the tests:

1	Response response = REST_API_call_Using_Wrapper Which_wraps_xml_response_in_a_response_helper_object;

Assert.IsTrue(response.HasHeader);
Assert.IsTrue(response.HasMessageId);
Assert.IsTrue(response.Has("X-SenderIP: " + senderIp));
Assert.IsTrue(response.Has("X-SenderDomain: " + senderDomain));
Assert.IsTrue(response.Has("X-recipientDomain: " + recipientDomain));
Assert.IsTrue(response.Has("X-SPF: " + spfValue));
Assert.IsTrue(response.Has("X-1stClassification: " + firstClassificationResult));
Assert.IsTrue(response.Has("X-2ndClassification: " + secondClassificationResult));
Assert.IsTrue(response.Has("X-3rdClassification: " + thirdClassificationResult));
Assert.IsTrue(response.Has("X-MANUALLY-CLASSIFIED: " + manuallyClassified));

As you can see there is a lot of duplication (Assert.IsTrue is basically noise). It’s also not very clear what the intent of those assert is.

Since Response is a Test Helper class. We thought moving the asserts on the response makes sense. But we also want to make sure the person reading this test understands that we are verifying a bunch of things on the response object.

Since we are using C#, we could do the following using a Delegate.

1	public delegate void ThingsToBeVerified();

public void AssertThat(ThingsToBeVerified codeBlock)
{
  codeBlock();
}

response.AssertThat(
  delegate{
    response.HasHeader;
    response.HasMessageId;
    response.Has("X-SenderIP: " + senderIp);
    response.Has("X-SenderDomain: " + senderDomain);
    response.Has("X-recipientDomain: " + recipientDomain);
    response.Has("X-SPF: " + spfValue);
    response.Has("X-1stClassification: " + firstClassificationResult);
    response.Has("X-2ndClassification: " + secondClassificationResult);
    response.Has("X-3rdClassification: " + thirdClassificationResult);
    response.Has("X-MANUALLY-CLASSIFIED: " + manuallyClassified);
  }
);

Now that we got the asserts out of the way. The following things stand-out as redundant:

The repeating response word
The semicolon at the end of each line
The ‘: ” + ‘ in each Has call

So we got rid of the delegate and used Method Chaining (fluent interfaces) instead. (Other samples of using Fluent Interfaces in Tests)

response.AssertThat.It
                      .HasHeader
                      .HasMessageId
                      .Has("X-SenderIP",senderIp)
                      .Has("X-SenderDomain",senderDomain)
                      .Has("X-recipientDomain", recipientDomain)
                      .Has("X-SPF", spfValue)
                      .Has("X-1stClassification", firstClassificationResult)
                      .Has("X-2ndClassification", secondClassificationResult)
                      .Has("X-3rdClassification", thirdClassificationResult)
                      .Has("X-MANUALLY-CLASSIFIED", manuallyClassified);

Now the Has call and the parentheses looks redundant. One way to eliminate that is by using Operator overloading, something like:

lets.checkThat(response).HasHeader.HasMessageId.Has + "X-SenderIP" = senderIp + "X-SenderDomain" = senderDomain
        + "X-recipientDomain" = recipientDomain + "X-SPF" = spfValue + "X-1stClassification" = firstClassificationResult
        + "X-2ndClassification" = secondClassificationResult + "X-3rdClassification" = thirdClassificationResult + "X-MANUALLY-CLASSIFIED" = manuallyClassified;

We have not implemented this, but technically its possible to do this.

Posted in Agile, Testing | No Comments »