I’m OK with some DRY violations. Here’s why

If you haven’t heard of DRY, it stands for, “Don’t Repeat Yourself” and it says “Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.”

I first learned about this principle in The Pragmatic Programmer and immediately connected with it. I think the reason why is that it was so easy to apply compared to all the other software engineering principles. Most principles are very abstract but this one is easy to follow. When you see two methods that are similar, combine the methods into one and both places should use the single method.

The benefit of fixing DRY violations is that if the code is copied and pasted in two places and then you discover a bug in one place, you may forget to fix it in the other place. You spent all the time fixing it the first time and now you’re going to spend all this time fixing it the second time. Bugs are very expensive and this could have been prevented the second time if you DRY’d up the code. Even if you realize you need to fix the bug in two methods upfront, you still have to put in the effort to fix it twice.

Also, we spend much more time reading code than writing it. If there is duplicate code, there is even more code to read to understand the code base.

But, I rarely find two identical methods. Usually it’s more that two methods are a close match instead of an exact match. To merge the two into one, I have to add a flag on the outside to say conditionally skip a part of the similarities. Or, I have to extract a part of one of the methods and that new extraction is now a match with another method.

Don’t get me wrong. This often has a lot of benefits. But, these little compromises make the code slightly more complicated. It’s even worse when the functionality of the DRY’d method now needs to diverge in different ways depending on who’s calling it. Sometimes this works out, but other times you wish you were working with two different methods instead of one shared one. In my experience, it takes a long time to realize that a merged method should really be two separate methods. And until you figure this out, the code tends to get much more complex than it even would have if you left the methods as duplicates.

Lets summarize so far: DRY violations are bad because they increase the size of your code base and can be a cause of bugs. But when you fix these DRY violations your code can also become more complicated and therefore difficult to read.

When I’m practicing TDD (Test Driven Development), DRY violations are low on my list of things to deal with. When you’re practicing TDD by the book, these types of bugs are usually caught by a failing acceptance test. An acceptance test tells me if the feature works or fails. That gives me the luxury to keep the DRY violations around for longer than I’d feel comfortable with in an untested code base even if the bug is duplicated in two methods.

TDD also tends to increase code quality much more than fixing DRY violations does. It’s a false dichotomy, but if I had to choose TDD over fixing DRY violations, I’d choose TDD. This is why I’m comfortable leaving DRY violations around unless a test is driving me towards fixing them.

That’s why there’s a guideline we go by where I work. We avoid fixing questionable DRY violations until there are 3 duplicates. With a well tested code base, the advantages of fixing DRY violations don’t necessarily outweigh the disadvantages. That’s why we often wait and see.

Use Mocks Sparingly

Whether I write my tests in the classical style or the mockist style, I always find that my tests are higher quality when I avoid mocks. Some people must be thinking, “how can you use mockist style without mocks?” Well it turns out there’s actually a very strict definition of what a mock is and a lot of developers usually use the word incorrectly. The best definitions I’ve found are from Martin Fowler’s article titled, Mocks Aren’t Stubs. To quote the article:

  • Dummy objects are passed around but never actually used. Usually they are just used to fill parameter lists.
  • Fake objects actually have working implementations, but usually take some shortcut which makes them not suitable for production (an in memory database is a good example).
  • Stubs provide canned answers to calls made during the test, usually not responding at all to anything outside what’s programmed in for the test. Stubs may also record information about calls, such as an email gateway stub that remembers the messages it ‘sent’, or maybe only how many messages it ‘sent’.
  • Mocks are what we are talking about here: objects pre-programmed with expectations which form a specification of the calls they are expected to receive.

I’m using these definitions when I say that you should avoid mocks in your tests. When I mock objects in Java, I’m often using a library named Mockito. If you write a test like this, you know you’re using a mock:


//test that subject.doStuffAndReturnList() clears the list
mockedList = mock(List.class);
subject = new Subject(mockedList);

subject.doStuffAndReturnList();

verify(mockedList).clear();

The clear giveaway here is the verify keyword. I avoid this whenever I can and consider it a smell. In this example the verify checks that the mockedList is cleared if you call the doStuffAndReturnList method on subject.

Here’s why I don’t like this: I do not get the confidence I want from this test. Lets say I want the mockedList to be cleared after I call doStuffAndReturnList(). How do I know that something isn’t added to the list after clear is called? Well, you can use mocks to verify this, too:

verify(mockedList, never()).add(anyObject());

All good? Well the problem is there are many ways to add an element to a list. Maybe the set method was used instead and your test isn’t catching that. In other words, the way that the mockedList is used inside the subject should be an implementation detail. But, once you use a mock (as opposed to a stub) these details are now exposed.

Here’s what I consider to be a superior test:


//test that subject.doStuffAndReturnList() clears the list
realList = new List();
realList.add("foo");
subject = new Subject(stubbedList);

List result = subject.doStuffAndReturnList();

assertThat(result, empty());

Rather than testing how the subject’s dependencies are implemented, I instead check that the subject works the way I expect. This gives me the freedom to populate the list any way I choose but know that it’s empty after doStuffAndReturnList() is called.

There are some exceptions to this rule that I practice. Once in a while I will create a method that returns void but I really want to make sure it’s called in a test. An example is testing that a validate method is called that throws an exception. This is a rare occurrence.

What I like about avoiding mocks is my code ends up being more “functional” in style. I pass in parameters to a function and assert that the results look the way I expect. The implementation details are hidden from the test, even if I’m practicing mockist style TDD.

Testing On Rails: The prescriptive way I test a feature from start to finish

In this article I’m going to give a prescriptive way of practicing TDD. This is currently the protocol I use and its helped me a lot in my development because it’s prescriptive.

Through most of my career, I didn’t have someone to pair with that was more experienced than me when it came to TDD and I had to pick up a lot of things by doing it the wrong way first. That’s not so bad, but what was really crippling was stalling whenever there was a fork in the road.

My goal for this article is to tell you whether to go left or to go right when you reach those forks. I want to remove the ambiguity that’s so common when you try to learn TDD one blog article at a time.

Most of these practices are not unique. In fact, here is a good video that espouses this practice given by Justin Searls: https://www.youtube.com/watch?v=VD51AkG8EZw This talk is so aligned with the way I test that you can basically consider this article to be a summary of his video. But there are some places where I go in depth and some people may prefer an article to a video.

Here’s the example we’ll use for the article: When you visit the user list page, you should see a list of usernames. This list comes from a 3rd party service that is not a part of your application. Some other team builds and maintains it.

Step 1: Write an end to end test

The first thing we’re going to do is write a test that connects to a running version of our app, goes to the user page and checks that there is a list of usernames on the page. This is going to be the first test we write, but for this feature it will be the last test to pass.

It’s not going to test edge cases like, “What if there are no users?” Or, what if a username has newlines in it. Rather, this test is going to be for what we call, “The happy path”. All of those edge cases will be tested in other kinds of tests because end to end tests are relatively slow and the point is to check that the parts of your code are integrated together correctly.

The user service your code will call out to is not part of your code. You don’t want to be effected by it going down or its data changing each run. For this reason, the code you test against will be calling a mocked out user service, not the real one. It will always return the same response when you visit the list page. These are concerns you should care about, just not in this test because this test is relatively slow and you want your test suite as a whole to be very fast.

Step 2: Write mockist style tests all the way down

If you’re unfamiliar with mockist style testing, here’s how it works: You write a test for a class that depends on other code. Instead of trying to implement all the code at once, you write some high level code that delegates to lower level implementations. These lower level implementations are mocked out instead of implemented. Here’s an example test that I would write first:

This is what I would write soon after I wrote my end to end test. Actually, if I was using a single page app framework like angular, I’d probably test the javascript first. But if I had no javascript relevant to the end to end test, I’d write the above test next.

What’s worth noting is I’d write this out and it wouldn’t compile. Once I got to my first compile error, I’d fix it and go back to writing the test. For example, UserListController wouldn’t exist. The test would fail to compile for that reason, then I’d create it. The same goes for the UserRepository.

In both of these cases, I’d do only as little as necessary to get the test to compile, then I’d go back to writing my code. In the case of the UserListController and UserRepository, that means I’d only type out the method signature and return null if necessary. I write the body of my test before I create the fields, because I don’t really know which fields I’ll need until I try to use them.

The reason I have a field named subject is it tells me which class I’m testing in this test. When I lose track of where I am, it’s a quick reminder and it’s more convenient than scrolling to the top of the file.

I write this test and think of all the edge cases I avoided in the earlier section. After I’ve done that, I follow this same process but with a new test named UserRepositoryTest. I always test outside in: I test the class that has dependencies before the class that is a dependency. This ensures I write the smallest amount of code necessary to get the tests to pass and ensures that I’ll use every class I create. When you test inside-out, your classes often fail to fit together and you end up creating things that you didn’t really need in the end.

Here’s a practice I often follow that is non-idiomatic in a lot of languages, but has worked out very well for me so far: I try to keep one method per class. I try to think of this as the Single Responsibility Principle to the extreme. The end result is a proliferation of little classes and a lot less mental overhead. It’s straightforward to recognize if a class is doing more than it’s supposed to. If you have some unrelated code you don’t have to look around for the class that it may belong in because that will almost always be a brand new class. And, it takes a lot of mutable state out of your system.

I know this sounds weird, but so far my code has been very maintainable from following this practice. I suggest you give it a try and I suspect you’ll find the same.

You’re going to go outside-in all the way down to the 3rd party system and stop there. When you reach that point, all your tests from Step 2 should pass all the way down, but your end to end test should still be failing because it’s connecting to the real 3rd party service and it returns data that you can’t control. The last step is to mock that out. But before that we need to test the way the 3rd party service works.

Step 3: Write tests for the real 3rd party service

Depending on how much uncertainty/risk is involved, this step may be swapped with step 2. For example, if you have no idea what parameters the API needs, you may want to write these tests before you write your unit tests all the way to the API. You could end up refactoring all the way back up if you realize you need to pass in a parameter you didn’t know about. But, if I can get away with it, I prefer to do this at step 3.

The API you need to call probably isn’t exactly what you would like. Even if it is today, it may change tomorrow. So I like to wrap it with a thin layer of my own code that separates any of the 3rd party code from mine. Then, I write my tests against this thin layer of code that use the 3rd party code. I don’t test the 3rd party code directly, I test it through my thin layer.

These tests are slow and unstable because I can’t control these 3rd party services. They could be down or the their schema could change without a warning. For that reason I run these 3rd party tests at a different phase from my end to end tests and my mockist style tests. Just because one of these 3rd party tests fails doesn’t mean there’s an issue with my code.

When steps 1-3 are complete the end to end test could still be failing. There is one last step:

Step 4: Mock out the 3rd party service

Near the bottom of your tests from Step 2, you should reach a point where you need to call the 3rd party service. In Step 3, you are testing the real 3rd party service. In this step, you’re going to mock out the 3rd party service so that your end to end tests are stable when and if the 3rd party service goes down.

The thin layer of code we wrote around the 3rd party service needs to be configurable so that it can either use the real service or the mock service that we are going to test in this step.

I like to configure this by a URL and the URL points to a fake service when running my end to end tests. A tool to help with this (if you’re using Java) is Mock Server.

Alternatively, you could create a stub class that implements your thin wrapper code of Step 3 but I prefer a URL because it can be valuable to test that network trip to the fake server.

Now, there is a situation we have to deal with if it hasn’t already been covered: We need to make sure that if the 3rd party schema changes, we have to change our 3rd party tests and our mock server requests/responses to reflect the change. Changing only one or the other is very dangerous because it’ll mean your tests could be passing when the real code isn’t integrating correctly.

But once you’ve figured out how to do that, your end to end test should be consistently passing, your feature should be complete, and you’re ready to repeat Steps 1-4 on your next feature.

11 things I learned about TDD by looking at the tests in the JUnit source

JUnit is a unit testing framework for the Java programming language. JUnit has been important in the development of test-driven development, and is one of a family of unit testing frameworks which is collectively known as xUnit that originated with SUnit.

JUnit was originally written by Erich Gamma and Kent Beck. The latter literally wrote the book on Test Driven Development.  I thought it would be worthwhile to look at some of the tests in the library and draw attention to some interesting things about them because it may change people’s ideas of how to practice TDD.

Continue reading