This is a simple koans for Java and Groovy. The intent is to create a method that consumes a string of arbitrary length and returns a map where the key is the lower case letter and the value is the number of instances of that letter found in the input string.
Example Input | Expected output |
---|---|
"" |
[:] - Empty Map |
"a" |
['a':1] |
"aabcccc" |
[a:2, b:1, c:4] |
The first implementation that popped into my head was a pretty trivial for-loop based
algorithm that loops through the string and stores the result in a HashMap
named
result
: it looks like this:
public class LetterCounterBase {
public Map<Character, Integer> countUsage(String inputString) {
HashMap<Character, Integer> result = new HashMap<>();
for (int i = 0; i < inputString.length(); i++) {
char c = inputString.charAt(i);
if (result.containsKey(c)) {
result.put(c, result.get(c) + 1);
} else {
result.put(c, 1);
}
}
return result;
}
}
Note at this point nulls are not handled, the code is a little more verbose than I would like, but for the most part it works.
At this point I decided to impement a second method that refines the first one and hopefully
makes the code a bit more succinct. To do so I keep the loop, but realize that the if is
taking up more vertical space than it deserves for its small space in the algorithm and
the call to result.put
is duplicated in the code. So I extract an interface, since now
I have two implementations that I’ll want to test and implement the following class:
package com.aceprogramming.koans.letter.counter;
import java.util.HashMap;
import java.util.Map;
final class LetterCounterRefinement1 implements LetterCounter {
LetterCounterRefinement1() {
}
public Map<Character, Integer> countUsage(String inputString) {
HashMap<Character, Integer> result = new HashMap<>();
for (int i = 0; i < inputString.length(); i++) {
char c = inputString.charAt(i);
int count = result.containsKey(c) ? result.get(c) + 1 : 1;
result.put(c, count);
}
return result;
}
}
In this implementation I realized that it was not the put that changed in the if, but
the count that was being put into the Map
so I changed the branch to only deal
with that count
and factored out the result.put
onto its own line, which removed
the duplication of the two calls to result.put
. The readability of the code also
improves in that I can now clearly see that if the map already contains the letter
then the count becomes 1 added to the existing value otherwise it just becomes 1. In
both cases, the Map
is updated.
Because I now have a nice interface over the class I also implement the interface in my original class:
final class LetterCounterBase implements LetterCounter {
public Map<Character, Integer> countUsage(String inputString) {
HashMap<Character, Integer> result = new HashMap<>();
for (int i = 0; i < inputString.length(); i++) {
char c = inputString.charAt(i);
if (result.containsKey(c)) {
result.put(c, result.get(c) + 1);
} else {
result.put(c, 1);
}
}
return result;
}
}
Note at this point I also make both classes final and package-protected. Final because they will not be intentionally designed for extension, package-protected to keep them nicely hidden behind the interface.
For completeness, here is what the interface looks like:
package com.aceprogramming.koans.letter.counter;
import java.util.Map;
public interface LetterCounter {
Map<Character, Integer> countUsage(String inputString);
}
Just after doing this, I realized I missed a really simple test case in my spock test. I’m not checking for a null input. I considered the empty string in my input, but never nulls.
Noticing that the null case is not covered leads me to update my spock specification to include it:
package com.aceprogramming.koans.letter.counter
import spock.lang.Specification
import spock.lang.Unroll
class LetterCounterSpec extends Specification {
def letterCounters = [new LetterCounterBase(), new LetterCounterRefinement1()]
def "A LetterCounter can be created"() {
expect:
letterCounters.each {assert(it != null)}
}
@Unroll
def "Check counted letters for #inputString"(String inputString, Map expectedResult) {
expect:
letterCounters.each {assert(it.countUsage(inputString) == expectedResult)}
where:
inputString | expectedResult
"" | [:]
"a" | [((char)'a'):1]
"aa" | [((char)'a'):2]
"aab" | [((char)'a'):2, ((char)'b'):1]
"aabccc" | [((char)'a'):2, ((char)'b'):1, ((char)'c'):3]
"cabcac" | [((char)'a'):2, ((char)'b'):1, ((char)'c'):3]
null | [:]
}
}
As can be seen above I decided that null should be handled the same way that the empty String is an return back an empty map. It could be argued that throwing (or continue to let be thrown) an exception would be better, since null could be considered invalid input. I have found that doing this when the result can be an empty container often leads to more bugs and confusion than just handling the null and returning the empty container. The argument against this is that it doesn’t catch what could be a bug in the callers code, but I would rather have my code be more robust and trust that the caller does his or her testing as well.
Adding in the test for null causes my test to fail with a NullPointerException
because
neither of my initial implementations handle null so its time to fix both of them.
To fix the NullPointerException
problem in the initial LetterCounterBase
implementation
I wanted to stick with a solution that was similar in concept to what I had already done.
Something that was just simple and effective even if it were verbose. Here is the solution
to the LetterCounterBase
:
public Map<Character, Integer> countUsage(String inputString) {
if(inputString == null) {
inputString = "";
}
HashMap<Character, Integer> result = new HashMap<>();
for (int i = 0; i < inputString.length(); i++) {
char c = inputString.charAt(i);
if (result.containsKey(c)) {
result.put(c, result.get(c) + 1);
} else {
result.put(c, 1);
}
}
return result;
}
I just placed a simple if
check at the beginning that flips the null to the empty string, since
I decided to handle nulls the same way as the empty string. The solution is straight
forward, but it also begins to show some of the problems with this overall approach in
that the method is already getting long and verbose. The cognitive load of the method
is also starting to creep up to the point where I’m starting to feel the need to comment
the code. The overall solution while quick is clearly not the greatest.
For the LetterCounterRefinement1
implementation, the somewhat cleaner style of my
initial approach led me to a cleaner and better way to handle the input null:
public Map<Character, Integer> countUsage(String inputString) {
return inputString == null ? countLetters("") :
countLetters(inputString);
}
private static Map<Character, Integer> countLetters(String inputString) {
HashMap<Character, Integer> result = new HashMap<>();
for (int i = 0; i < inputString.length(); i++) {
char c = inputString.charAt(i);
int count = result.containsKey(c) ? result.get(c) + 1 : 1;
result.put(c, count);
}
return result;
}
Here we follow the Clean Code rule:
FUNCTIONS SHOULD DO ONE THING. THEY SHOULD DO IT WELL. THEY SHOULD DO IT ONLY.
Clean Code
The first method just handles the null situation. It delegates everything else to a private method that does the real work of counting the letters. This simple rule makes it transparently clear what the intent of the code is with no desire to add comments.
With the nulls handled and the unit tests passing, I now want to explore how this could be implemented in other ways perhaps making better use of modern java features…
The first thing I noticed when I started surveying more modern Java techniques is that
Java 8 added a compute method to Map. (Javadoc).
This method allows me to simplify the LetterCoutnerRefinement1
implementation cutting two
lines into one and makeing the code a tad bit cleaner:
private static Map<Character, Integer> countLetters(String inputString) {
HashMap<Character, Integer> result = new HashMap<>();
for (int i = 0; i < inputString.length(); i++) {
char c = inputString.charAt(i);
result.compute(c, (k, v) -> (v == null) ? 1 : ++v);
}
return result;
}
While I don’t care for compute
as the name of the method much (since it is really mapping
the value) I can live with it since it is the built in standard. This certainly feels like
a good step in the right direction. The result of this change is that I moved the branch
check for the value being null into the compute call which now executes that function and
automatically puts the result. Because compute passes a null value into the value fuction
when the key is not in the map, everything works the same and we are golden.
My next thought was to do this using the even more modern Java 8 Streams API. The streams API gives us a great example of internal iteration that enables a much more functional style of programming. By "internal iteration" I mean that the actual iteration is hidden from the developer and instead a function is passed in that operates on each element of a stream as those elements pass by. This can have some nice benefits among them is very concise code and the potential for parallelism. Here is my first cut at using streams:
package com.aceprogramming.koans.letter.counter;
import java.util.Map;
import java.util.stream.Collectors;
final class LetterCounterRefinement3 implements LetterCounter{
LetterCounterRefinement3() {
}
@Override
public Map<Character, Integer> countUsage(String inputString) {
return inputString.chars()
.mapToObj(c -> (char)c)
.collect(Collectors.groupingBy(c -> c, Collectors.summingInt(c -> 1)));
}
}
There are a few of interesting things to note about this implementation. First
I had to use Collectors.summingInt(c → 1)
instead of Collectors.counting()
because
counting
returns a Long
where my interface requires an Integer
. I could have
refactored the interface but I chose to keep it the same and find a method that does
what I wanted it to even though it increases complexity a tiny bit.
The second thing I find interesting is that even though the method length is much smaller, the solution is not as clearly readable as any of the previous ones. Usually less code means cleaner code, all other things being equal. That (at least for me) is not true in this case. Here are the things in my new stream based solution that I think are harmful to its readablity:
-
calling
chars
on a string does not create a stream of Character or char. It creates an IntStream which is a stream of int. This makes it necessary to do an odd cast in themapToObj
method. -
Even though I want the
mapToObj
method to map the ints in the streams to theCharacter
object I still cast to a base type of (char). There appears to be no clean way out of this -
The
Collectors.groupingBy
method does most of the work, but its not readably clear how to anyone who has not done functional group bys before. It is also hard to reason about the types of the resultant map.
The last thing I find interesting about this implemantion is that my unit tests don’t
pass. In particular I get a NullPointerException
when I pass in a null string.
So I’ll need to fix that before commiting to git.
To fix the null pointer exception, I could follow the same pattern I did in
LetterCounterRefinement2
(and kind of in 1) but that really feels ugly. I
would have the same duplicate code in 3 places. There is no excuse for that!!
So to fix, I factor out the null check into its own class that then defers
the actual letter counting to the real class. I wrap the null check around
the implementation allowing me to reuse it wherever I want. Doing things this
way also has the huge advantage of allowing a client to use the null safe version
or the non-null safe version of an implementation easily. This is an example of
the Decorator Pattern.
To accomplish this I create a NullSafeLetterCounter
that does nothing except the null
check and then defers to the class that it wraps. All I have done here is taken the
first method out of LetterCounterRefinement2, parameterized it with the actual algorithm
to call and put it in a class. Here is the result:
package com.aceprogramming.koans.letter.counter;
import java.util.Map;
import java.util.Objects;
final class NullSafeLetterCounter implements LetterCounter {
private final LetterCounter letterCounter;
NullSafeLetterCounter(final LetterCounter letterCounter) {
this.letterCounter = Objects.requireNonNull(letterCounter, "The input letter counter cannot be null.");
}
@Override
public Map<Character, Integer> countUsage(String inputString) {
return inputString == null ? letterCounter.countUsage("")
: letterCounter.countUsage(inputString);
}
}
Then I remove that functionality from the other LetterCounter
implementations and
change my unit test to use the null safe wrapper:
package com.aceprogramming.koans.letter.counter
import spock.lang.Specification
import spock.lang.Unroll
class LetterCounterSpec extends Specification {
def letterCounters = [new NullSafeLetterCounter(new LetterCounterBase()),
new NullSafeLetterCounter(new LetterCounterRefinement1()),
new NullSafeLetterCounter(new LetterCounterRefinement2()),
new NullSafeLetterCounter(new LetterCounterRefinement3())]
def "A LetterCounter can be created"() {
expect:
letterCounters.each {assert(it != null)}
}
@Unroll
def "Check counted letters for #inputString"(String inputString, Map expectedResult) {
expect:
letterCounters.each {assert(it.countUsage(inputString) == expectedResult)}
where:
inputString | expectedResult
"" | [:]
"a" | [((char)'a'):1]
"aa" | [((char)'a'):2]
"aab" | [((char)'a'):2, ((char)'b'):1]
"aabccc" | [((char)'a'):2, ((char)'b'):1, ((char)'c'):3]
"cabcac" | [((char)'a'):2, ((char)'b'):1, ((char)'c'):3]
null | [:]
}
}
Note at this point I could factor out the test for just the null check and not have
the repeaded call to new the NullSafeLetterCounter
. In a produciton system I would
also provide a factory of some type to be able to create the various letter counters.
That may be a good starting point for a different exercise.
I started with a really simple problem to count the occurrences of letters in an
arbitrary String
. I then created unit tests for the known requirements and implemented
a quick solution to those unit tests as a baseline. I then used that to iterate and refine
the solution in order to explore some of the newer features of Java. In the end I ended
up with 4 pretty reasonable solutions to the problem and a bunch of new knowledge.
As a note, my favorite implementation is LetterCounterRefinement2
as I stated above
I find the Java Streams API version to be less readable. But that may be just a matter
of taste and my age. :)