- Chapter 1 - Clean Code
- Chapter 2 - Meaningful Names
- Chapter 3 - Functions
- Chapter 4 - Comments
- Chapter 5 - Formatting
- Chapter 6 - Objects and Data Structures
- Chapter 7 - Error Handling
- Chapter 8 - Boundaries
- Chapter 9 - Unit Tests
- Chapter 10 - Classes
- Chapter 11 - Systems
- Chapter 12 - Emergence
- Chapter 13 - Concurrency
- Chapter 14 - Successive Refinement
- Chapter 15 - JUnit Internals
- Chapter 16 - Refactoring SerialDate
- Chapter 17 - Smells and Heuristics
The code represents the detail of the requirements and the details cannot be ignored or abstracted. We may create languages that are closer to the requirements. We can create tools that help us parse and assemble those requirements into formal structures. But we will never eliminate necessary precision.
- Are you in a rush?
- Do you try to go "fast"?
- Do not you have time to do a good job?
- Are you tired of work in the same program/module?
- Does your Boss push you to finish soon?
The previous arguments could create a swamp of senseless code.
If you say "I will back to fix it later" you could fall in the LeBlanc's law "Later equals never"
You are a professional and the code is your responsibility. Let's analyze the following anecdote:
What if you were a doctor and had a patient who demanded that you stop all the silly hand-washing in preparation for surgery because it was taking too much time? Clearly the patient is the boss; and yet the doctor should absolutely refuse to comply. Why? Because the doctor knows more than the patient about the risks of disease and infection. It would be unprofessional (never mind criminal) for the doctor to comply with the patient.
So too it is unprofessional for programmers to bend to the will of managers who don’t understand the risks of making messes.
Maybe sometime you think in go fast to make the deadline. The only way to go fast is to keep the code as clean as possible at all times.
Each experienced programmer has his/her own definition of clean code, but something is clear, a clean code is a code that you can read easily. The clean code is code that has been taken care of.
In his book Uncle Bob says the next:
Consider this book a description of the Object Mentor School of Clean Code. The techniques and teachings within are the way that we practice our art. We are willing to claim that if you follow these teachings, you will enjoy the benefits that we have enjoyed, and you will learn to write code that is clean and professional. But don’t make the mistake of thinking that we are somehow “right” in any absolute sense. There are other schools and other masters that have just as much claim to professionalism as we. It would behoove you to learn from them as well.
It’s not enough to write the code well. The code has to be kept clean over time. We have all seen code rot and degrade as time passes. So we must take an active role in preventing this degradation.
It's a good practice apply the Boy Scout Rule
Always leave the campground cleaner than you found it.
Names are everywhere in software. Files, directories, variables functions, etc. Because we do so much of it. We have better do it well.
It is easy to say that names reveal intent. Choosing good names takes time, but saves more than it takes. So take care with your names and change them when you find better ones.
The name of a variable, function or class, should answer all the big questions. It should tell you why it exists, what it does, and how is used. If a name requires a comment, then the name does not reveals its intent.
Does not reveals intention | Reveals intention |
---|---|
int d; // elapsed time in days |
int elapsedTimeInDays |
Choosing names that reveal intent can make much easier to understand and change code. Example:
public List<int[]> getThem() {
List<int[]> list1 = new ArrayList<int[]>();
for (int[] x : theList)
if (x[0] == 4)
list1.add(x);
return list1;
}
This code is simple, but create many questions:
- What is the content of
theList
? - What is the significance of the item
x[0]
in the list?. - Why we compare
x[0]
vs4
? - How would i use the returned list?
The answers to these questions are not present in the code sample, but they could have been. Say that we’re working in a mine sweeper game. We can refactor the previous code as follows:
public List<int[]> getFlaggedCells() {
List<int[]> flaggedCells = new ArrayList<int[]>();
for (int[] cell : gameBoard)
if (cell[STATUS_VALUE] == FLAGGED)
flaggedCells.add(cell);
return flaggedCells;
}
Now we know the next information:
theList
represents thegameBoard
x[0]
represents a cell in the board and4
represents a flagged cell- The returned list represents the
flaggedCells
Notice that the simplicity of the code has not changed. It still has exactly the same number of operators and constants, with exactly the same number of nesting levels. But the code has become much more explicit.
We can improve the code writing a simple class for cells instead of using an array of ints
. It can include an intention-revealing function (called it isFlagged
) to hide the magic numbers. It results in a new function of the function.
public List<Cell> getFlaggedCells() {
List<Cell> flaggedCells = new ArrayList<Cell>();
for (Cell cell : gameBoard)
if (cell.isFlagged())
flaggedCells.add(cell);
return flaggedCells;
}
Programmers must avoid leaving false clues that obscure the meaning of code. We should avoid words whose entrenched meaning vary from our intended meaning.
Do not refer to a grouping of accounts as an accountList
unless it's actually a List
. The word List
means something specific to programmers. If the container holding the accounts is not actually a List, it may lead to false conclusions. So accountGroup
or bunchOfAccounts
or just plain accounts
would be better.
Beware of using names which vary in small ways. How long does it take to spot the subtle difference between a XYZControllerForEfficientHandlingOfStrings
in one module and, somewhere a little more distant, XYZControllerForEfficientStorageOfStrings
? The words have frightfully similar shapes
Programmers create problems for themselves when they write code solely to satisfy a compiler or interpreter. For example because you can't use the same name to refer two different things in the same scope, you might be tempted to change one name in an arbitrary way. Sometimes this is done by misspelling one, leading to the surprising situation where correcting spelling errors leads to an inability to compile. Example, you create the variable klass
because the name class
was used for something else.
In the next function, the arguments are noninformative, a1
and a2
doesn't provide clues to the author intention.
public static void copyChars(char a1[], char a2[]) {
for (int i = 0; i < a1.length; i++) {
a2[i] = a1[i];
}
}
We can improve the code selecting more explicit argument names:
public static void copyChars(char source[], char destination[]) {
for (int i = 0; i < source.length; i++) {
destination[i] = source[i];
}
}
Noise words are another meaningless distinction. Imagine that you have a Product class. If you have another called ProductInfo
or ProductData
, you have made the names different without making them mean anything different. Info and Data are indistinct noise words like a, an, and the.
Noise words are redundant. The word variable should never appear in a variable name. The word table should never appear in a table name.
Imagine you have the variable genymdhms
(Generation date, year, month, day, hour, minute and second) and imagine a conversation where you need talk about this variable calling it "gen why emm dee aich emm ess". You can consider convert a class like this:
class DtaRcrd102 {
private Date genymdhms;
private Date modymdhms;
private final String pszqint = "102";
/* ... */
};
To
class Customer {
private Date generationTimestamp;
private Date modificationTimestamp;;
private final String recordId = "102";
/* ... */
};
Single-letter names and numeric constants have a particular problem in that they are not easy to locate across a body of text.
We have enough encodings to deal with without adding more to our burden. Encoding type or scope information into names simply adds an extra burden of deciphering. Encoded names are seldom pronounceable and are easy to mis-type. An example of this, is the use of the Hungarian Notation or the use of member prefixes.
These are sometimes a special case for encodings. For example, say you are building an ABSTRACT FACTORY for the creation of shapes. This factory will be an interface and will be implemented by a concrete class. What should you name them? IShapeFactory
and ShapeFactory
? Is preferable to leave interfaces unadorned.I don’t want my users knowing that I’m handing them an interface. I just want them to know that it’s a ShapeFactory
. So if I must encode either the interface or the implementation, I choose the implementation. Calling it ShapeFactoryImp
, or even the hideous CShapeFactory
, is preferable to encoding the interface.
Readers shouldn't have to mentally translate your names into other names they already know.
One difference between a smart programmer and a professional programmer is that the professional understands that clarity is king. Professionals use their powers for good and write code that others can understand.
Classes and objects should have noun or noun phrase names like Customer
, WikiPage
, Account
, and AddressParser
. Avoid words like Manager
,Processor
, Data
, or Info
in the name of a class. A class name should not be a verb.
Methods should have verb or verb phrase names like postPayment
, deletePage
or save
. Accessors, mutators, and predicates should be named for their value and prefixed with get, set, and is according to the javabean standard.
When constructors are overloaded, use static factory methods with names that describe the arguments. For example:
Complex fulcrumPoint = Complex.FromRealNumber(23.0);
Is generally better than
Complex fulcrumPoint = new Complex(23.0);
Consider enforcing their use by making the corresponding constructors private.
Cute name | Clean name |
---|---|
holyHandGranade |
deleteItems |
whack |
kill |
eatMyShorts |
abort |
Pick one word for one abstract concept and stick with it. For instance, it’s confusing to have fetch, retrieve, and get as equivalent methods of different classes.
Avoid using the same word for two purposes. Using the same term for two different ideas is essentially a pun.
Example: in a class use add
for create a new value by adding or concatenating two existing values and in another class use add
for put a simple parameter in a collection, it's a better options use a name like insert
or append
instead.
Remember that the people who read your code will be programmers. So go ahead and use computer science (CS) terms, algorithm names, pattern names, math terms, and so forth.
When there is no “programmer-eese” for what you’re doing, use the name from the problem domain. At least the programmer who maintains your code can ask a domain expert what it means.
There are a few names which are meaningful in and of themselves—most are not. Instead, you need to place names in context for your reader by enclosing them in well-named classes, functions, or namespaces. When all else fails, then prefixing the name may be necessary as a last resort
Variables like: firstName
, lastName
, street
, city
, state
. Taken together it's pretty clear that they form an address, but, what if you saw the variable state being used alone in a method?, you could add context using prefixes like: addrState
at least readers will understand that the variable is part of a large structure. Of course, a better solution is to create a class named Address
then even the compiler knows that the variables belong to a bigger concept
In an imaginary application called “Gas Station Deluxe,” it is a bad idea to prefix every class with GSD. Example: GSDAccountAddress
Shorter names are generally better than longer ones, so long as they are clear. Add no more context to a name than is necessary.
Functions are the first line of organization in any topic.
The first rule of functions is that they should be small. The second rule of functions is that they should be smaller than that.
This implies that the blocks within if
statements, else
statements, while
statements, and so on should be one line long. Probably that line should be a function call. Not only does this keep the enclosing function small, but also adds documentary value because the function called within the block can have a nicely descriptive name.
This also implies that functions should not be large enough to hold nested structures. Therefore, the indent level of a function should not be greater than one or two. This, of course, makes the functions easy to read and understand.
FUNCTIONS SHOULD DO ONE THING. THEY SHOULD DO IT WELL. THEY SHOULD DO IT ONLY.
If you have a function divided in sections like declarations, initialization etc, it's a obvious symptom of the function is doing more than one thing. Functions that do one thing cannot be reasonably divided into sections.
In order to make sure our functions are doing "one thing", we need to make sure that the statements within our function are all at the same level of abstraction.
We want the code to read like a top-down narrative. 5 We want every function to be followed by those at the next level of abstraction so that we can read the program, descending one level of abstraction at a time as we read down the list of functions.
To say this differently, we want to be able to read the program as though it were a set of TO paragraphs, each of which is describing the current level of abstraction and referencing subsequent TO paragraphs at the next level down.
- To include the setups and teardowns, we include setups, then we include the test page content, and then we include the teardowns.
- To include the setups, we include the suite setup if this is a suite, then we include the regular setup.
- To include the suite setup, we search the parent hierarchy for the “SuiteSetUp” page and add an include statement with the path of that page.
- To search the parent...
It turns out to be very difficult for programmers to learn to follow this rule and write functions that stay at a single level of abstraction. But learning this trick is also very important. It is the key to keeping functions short and making sure they do “one thing.” Making the code read like a top-down set of TO paragraphs is an effective technique for keeping the abstraction level consistent.
It’s hard to make a small switch statement. 6 Even a switch statement with only two cases is larger than I’d like a single block or function to be. It’s also hard to make a switch statement that does one thing. By their nature, switch statements always do N things. Unfortunately we can’t always avoid switch statements, but we can make sure that each switch statement is buried in a low-level class and is never repeated. We do this, of course, with polymorphism.
You know you are working on clean code when each routine turns out to be pretty much what you expected
Half the battle to achieving that principle is choosing good names for small functions that do one thing. The smaller and more focused a function is, the easier it is to choose a descriptive name.
Don’t be afraid to make a name long. A long descriptive name is better than a short enigmatic name. A long descriptive name is better than a long descriptive comment. Use a naming convention that allows multiple words to be easily read in the function names, and then make use of those multiple words to give the function a name that says what it does.
Choosing descriptive names will clarify the design of the module in your mind and help you to improve it. It is not at all uncommon that hunting for a good name results in a favorable restructuring of the code.
The ideal number of arguments for a function is zero (niladic). Next comes one (monadic), followed closely by two (dyadic). Three arguments (triadic) should be avoided where possible. More than three (polyadic) requires very special justification—and then shouldn’t be used anyway.
Arguments are even harder from a testing point of view. Imagine the difficulty of writing all the test cases to ensure that all the various combinations of arguments work properly. If there are no arguments, this is trivial. If there’s one argument, it’s not too hard. With two arguments the problem gets a bit more challenging. With more than two arguments, testing every combination of appropriate values can be daunting.
Output arguments are harder to understand than input arguments. When we read a function, we are used to the idea of information going in to the function through arguments and out through the return value. We don’t usually expect information to be going out through the arguments. So output arguments often cause us to do a double-take.
There are two very common reasons to pass a single argument into a function. You may be asking a question about that argument, as in boolean fileExists(“MyFile”)
. Or you may be operating on that argument, transforming it into something else and returning it. For example, InputStream fileOpen(“MyFile”)
transforms a file name String
into an InputStream
return value. These two uses are what readers expect when they see a function. You should choose names that make the distinction clear, and always use the two forms in a consistent context.
Flag arguments are ugly. Passing a boolean into a function is a truly terrible practice. It immediately complicates the signature of the method, loudly proclaiming that this function does more than one thing. It does one thing if the flag is true
and another if the flag is false
!
A function with two arguments is harder to understand than a monadic function. For example, writeField(name)
is easier to understand than writeField(output-Stream, name)
There are times, of course, where two arguments are appropriate. For example, Point p = new Point(0,0);
is perfectly reasonable. Cartesian points naturally take two arguments.
Even obvious dyadic functions like assertEquals(expected, actual) are problematic. How many times have you put the actual where the expected should be? The two arguments have no natural ordering. The expected, actual ordering is a convention that requires practice to learn.
Dyads aren’t evil, and you will certainly have to write them. However, you should be aware that they come at a cost and should take advantage of what mechanims may be available to you to convert them into monads. For example, you might make the writeField method a member of outputStream so that you can say outputStream. writeField(name) . Or you might make the outputStream a member variable of the current class so that you don’t have to pass it. Or you might extract a new class like FieldWriter that takes the outputStream in its constructor and has a write method.
Functions that take three arguments are significantly harder to understand than dyads. The issues of ordering, pausing, and ignoring are more than doubled. I suggest you think very carefully before creating a triad.
Compare:
Circle makeCircle(double x, double y, double radius);
vs
Circle makeCircle(Point center, double radius);
Choosing good names for a function can go a long way toward explaining the intent of the function and the order and intent of the arguments. In the case of a monad, the function and argument should form a very nice verb/noun pair. For example, write(name)
is very evocative. Whatever this “name” thing is, it is being “written.” An even better name might be writeField(name)
, which tells us that the "name" thing is a "field".
This last is an example of the keyword form of a function name. Using this form we encode the names of the arguments into the function name. For example, assertEquals
might be better written as assertExpectedEqualsActual(expected, actual)
. This strongly mitigates the problem of having to remember the ordering of the arguments.
In general output arguments should be avoided. If your function must change the state of something, have it change the state of its owning object.
Functions should either do something or answer something, but not both. Either your function should change the state of an object, or it should return some information about that object. Doing both often leads to confusion.
Returning error codes from command functions is a subtle violation of command query separation.
Duplication may be the root of all evil in software. Many principles and practices have been created for the purpose of controlling or eliminating it.
Some programmers follow Edsger Dijkstra’s rules of structured programming. Dijkstra said that every function, and every block within a function, should have one entry and one exit. Following these rules means that there should only be one return statement in a function, no break
or continue
statements in a loop, and never, ever, any goto
statements.
While we are sympathetic to the goals and disciplines of structured programming, those rules serve little benefit when functions are very small. It is only in larger functions that such rules provide significant benefit.
So if you keep your functions small, then the occasional multiple return
, break
, or continue
statement does no harm and can sometimes even be more expressive than the single-entry, single-exit rule. On the other hand, goto
only makes sense in large functions, so it should be avoided
Nothing can be quite so helpful as a well-placed comment. Nothing can clutter up a module more than frivolous dogmatic comments. Nothing can be quite so damaging as an old comment that propagates lies and misinformation.
If our programming languages were expressive enough, or if we had the talent to subtly wield those languages to express our intent, we would not need comments very much—perhaps not at all.
Clear and expressive code with few comments is far superior to cluttered and complex code with lots of comments. Rather than spend your time writing the comments that explain the mess you’ve made, spend it cleaning that mess.
// Check to see if the employee is eligible for full benefits
if ((employee.flags & HOURLY_FLAG) && (employee.age > 65))
vs
if (employee.isEligibleForFullBenefits())
Some comments are necessary or beneficial. However the only truly good comment is the comment you found a way not to write.
Sometimes our corporate coding standards force us to write certain comments for legal reasons. For example, copyright and authorship statements are necessary and reasonable things to put into a comment at the start of each source file.
It is sometimes useful to provide basic information with a comment. For example, consider this comment that explains the return value of an abstract method:
// Returns an instance of the Responder being tested.
protected abstract Responder responderInstance();
A comment like this can sometimes be useful, but it is better to use the name of the function to convey the information where possible. For example, in this case the comment could be made redundant by renaming the function: responderBeingTested
.
Sometimes a comment goes beyond just useful information about the implementation and provides the intent behind a decision. Example:
public int compareTo(Object o)
{
if(o instanceof WikiPagePath)
{
WikiPagePath p = (WikiPagePath) o;
String compressedName = StringUtil.join(names, "");
String compressedArgumentName = StringUtil.join(p.names, "");
return compressedName.compareTo(compressedArgumentName);
}
return 1; // we are greater because we are the right type.
}
Sometimes it is just helpful to translate the meaning of some obscure argument or return value into something that's readable. In general it is better to find a way to make that argument or return value clear in its own right; but when its part of the standard library, or in code that you cannot alter, then a helpful clarifying comment can be useful.
Sometimes it is useful to warn other programmers about certain consequences.
// Don't run unless you
// have some time to kill.
public void _testWithReallyBigFile() {
writeLinesToFile(10000000);
response.setBody(testFile);
response.readyToSend(this);
String responseString = output.toString();
assertSubString("Content-Length: 1000000000", responseString);
assertTrue(bytesSent > 1000000000);
}
It is sometimes reasonable to leave “To do” notes in the form of //TODO comments. In the following case, the TODO comment explains why the function has a degenerate implementation and what that function's future should be.
//TODO-MdM these are not needed
// We expect this to go away when we do the checkout model
protected VersionInfo makeVersion() throws Exception {
return null;
}
TODOs are jobs that the programmer thinks should be done, but for some reason can’t do at the moment. It might be a reminder to delete a deprecated feature or a plea for someone else to look at a problem. It might be a request for someone else to think of a better name or a reminder to make a change that is dependent on a planned event. Whatever else a TODO might be, it is not an excuse to leave bad code in the system.
A comment may be used to amplify the importance of something that may otherwise seem inconsequential.
String listItemContent = match.group(3).trim();
// the trim is real important. It removes the starting
// spaces that could cause the item to be recognized
// as another list.
new ListItemWidget(this, listItemContent, this.level + 1);
return buildList(text.substring(match.end()));
There is nothing quite so helpful and satisfying as a well-described public API. The javadocs for the standard Java library are a case in point. It would be difficult, at best, to write Java programs without them.
Most comments fall into this category. Usually they are crutches or excuses for poor code or justifications for insufficient decisions, amounting to little more than the programmer talking to himself.
Plopping in a comment just because you feel you should or because the process requires it, is a hack. If you decide to write a comment, then spend the time necessary to make sure it is the best comment you can write. Example:
public void loadProperties() {
try {
String propertiesPath = propertiesLocation + "/" + PROPERTIES_FILE;
FileInputStream propertiesStream = new FileInputStream(propertiesPath);
loadedProperties.load(propertiesStream);
}
catch(IOException e) {
// No properties files means all defaults are loaded
}
}
What does that comment in the catch block mean? Clearly meant something to the author, but the meaning not come though all that well. Apparently, if we get an IOException
, it means that there was no properties file; and in that case all the defaults are loaded. But who loads all the defaults?
// Utility method that returns when this.closed is true. Throws an exception
// if the timeout is reached.
public synchronized void waitForClose(final long timeoutMillis) throws Exception {
if(!closed) {
wait(timeoutMillis);
if(!closed)
throw new Exception("MockResponseSender could not be closed");
}
}
What purpose does this comment serve? It’s certainly not more informative than the code. It does not justify the code, or provide intent or rationale. It is not easier to read than the code. Indeed, it is less precise than the code and entices the reader to accept that lack of precision in lieu of true understanding.
Sometimes, with all the best intentions, a programmer makes a statement in his comments that isn't precise enough to be accurate. Consider for another moment the example of the previous section. The method does not return when this.closed
becomes true
. It returns if this.closed
is true
; otherwise, it waits for a blind time-out and then throws an exception if this.closed
is still not true.
It is just plain silly to have a rule that says that every function must have a javadoc, or every variable must have a comment. Comments like this just clutter up the code, propagate lies, and lend to general confusion and disorganization.
/**
*
* @param title The title of the CD
* @param author The author of the CD
* @param tracks The number of tracks on the CD
* @param durationInMinutes The duration of the CD in minutes
*/
public void addCD(String title, String author, int tracks, int durationInMinutes) {
CD cd = new CD();
cd.title = title;
cd.author = author;
cd.tracks = tracks;
cd.duration = duration;
cdList.add(cd);
}
Sometimes people add a comment to the start of a module every time they edit it. Example:
* Changes (from 11-Oct-2001)
* --------------------------
* 11-Oct-2001 : Re-organised the class and moved it to new package com.jrefinery.date (DG);
* 05-Nov-2001 : Added a getDescription() method, and eliminated NotableDate class (DG);
* 12-Nov-2001 : IBD requires setDescription() method, now that NotableDate class is gone (DG); Changed getPreviousDayOfWeek(),
getFollowingDayOfWeek() and getNearestDayOfWeek() to correct bugs (DG);
* 05-Dec-2001 : Fixed bug in SpreadsheetDate class (DG);
Today we have source code control systems, we don't need this type of logs.
The comments in the follow examples doesn't provides new information.
/**
* Default constructor.
*/
protected AnnualDateRule() {
}
/** The day of the month. */
private int dayOfMonth;
Javadocs comments could enter in this category. Many times they are just redundant noisy comments written out of some misplaced desire to provide documentation.
Example:
// does the module from the global list <mod> depend on the
// subsystem we are part of?
if (smodule.getDependSubsystems().contains(subSysMod.getSubSystem()))
vs
ArrayList moduleDependees = smodule.getDependSubsystems();
String ourSubSystem = subSysMod.getSubSystem();
if (moduleDependees.contains(ourSubSystem))
This type of comments are noising
// Actions //////////////////////////////////
Example:
public class wc {
public static void main(String[] args) {
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
String line;
int lineCount = 0;
int charCount = 0;
int wordCount = 0;
try {
while ((line = in.readLine()) != null) {
lineCount++;
charCount += line.length();
String words[] = line.split("\\W");
wordCount += words.length;
} //while
System.out.println("wordCount = " + wordCount);
System.out.println("lineCount = " + lineCount);
System.out.println("charCount = " + charCount);
} // try
catch (IOException e) {
System.err.println("Error:" + e.getMessage());
} //catch
} //main
You could break the code in small functions instead to use this type of comments.
Example:
/* Added by Rick */
The VCS can manage this information instead.
InputStreamResponse response = new InputStreamResponse();
response.setBody(formatter.getResultStream(), formatter.getByteCount());
// InputStream resultsStream = formatter.getResultStream();
// StreamReader reader = new StreamReader(resultsStream);
// response.setContent(reader.read(formatter.getByteCount()));
If you don't need anymore, please delete it, you can back later with your VCS if you need it again.
HTML in source code comments is an abomination, as you can tell by reading the code below.
/**
* Task to run fit tests.
* This task runs fitnesse tests and publishes the results.
* <p/>
* <pre>
* Usage:
* <taskdef name="execute-fitnesse-tests"
* classname="fitnesse.ant.ExecuteFitnesseTestsTask"
* classpathref="classpath" />
* OR
* <taskdef classpathref="classpath"
* resource="tasks.properties" />
* <p/>
* <execute-fitnesse-tests
* suitepage="FitNesse.SuiteAcceptanceTests"
* fitnesseport="8082"
* resultsdir="${results.dir}"
* resultshtmlpage="fit-results.html"
* classpathref="classpath" />
* </pre>
*/
If you must write a comment, then make sure it describes the code it appears near. Don't offer systemwide information in the context of a local comment.
Don't put interesting historical discussions or irrelevant descriptions of details into your comments.
The connection between a comment and the code it describes should be obvious. If you are going to the trouble to write a comment, then at least you'd like the reader to be able to look at the comment and the code and understand what the comment is talking about
Short functions don’t need much description. A well-chosen name for a small function that does one thing is usually better than a comment header.
Javadocs are for public APIs, in nonpublic code could be a distraction more than a help.
Code formatting is important. It is too important to ignore and it is too important to treat religiously. Code formatting is about communication, and communication is the professional developer’s first order of business.
This concept consist in how to you separate concepts in your code, In the next example we can appreciate it.
package fitnesse.wikitext.widgets;
import java.util.regex.*;
public class BoldWidget extends ParentWidget {
public static final String REGEXP = "'''.+?'''";
private static final Pattern pattern = Pattern.compile("'''(.+?)'''",
Pattern.MULTILINE + Pattern.DOTALL
);
public BoldWidget(ParentWidget parent, String text) throws Exception {
super(parent);
Matcher match = pattern.matcher(text);
match.find();
addChildWidgets(match.group(1));
}
public String render() throws Exception {
StringBuffer html = new StringBuffer("<b>");
html.append(childHtml()).append("</b>");
return html.toString();
}
}
package fitnesse.wikitext.widgets;
import java.util.regex.*;
public class BoldWidget extends ParentWidget {
public static final String REGEXP = "'''.+?'''";
private static final Pattern pattern = Pattern.compile("'''(.+?)'''",
Pattern.MULTILINE + Pattern.DOTALL);
public BoldWidget(ParentWidget parent, String text) throws Exception {
super(parent);
Matcher match = pattern.matcher(text); match.find(); addChildWidgets(match.group(1));
}
public String render() throws Exception { StringBuffer html = new StringBuffer("<b>"); html.append(childHtml()).append("</b>"); return html.toString();
}
}
As you can see, the readability of the first example is greater than that of the second.
The vertical density implies close association. So lines of code that are tightly related should appear vertically dense. Check the follow example:
public class ReporterConfig {
/**
* The class name of the reporter listener */
private String m_className;
/**
* The properties of the reporter listener */
private List<Property> m_properties = new ArrayList<Property>();
public void addProperty(Property property) { m_properties.add(property);
}
public class ReporterConfig {
private String m_className;
private List<Property> m_properties = new ArrayList<Property>();
public void addProperty(Property property) {
m_properties.add(property);
}
}
The second code it's much easier to read. It fits in an "eye-full".
Variable Declarations. Variables should be declared as close to their usage as possible. Because our functions are very short, local variables should appear at the top of each function,
Instance variables, on the other hand, should be declared at the top of the class. This should not increase the vertical distance of these variables, because in a well-designed class, they are used by many, if not all, of the methods of the class.
There have been many debates over where instance variables should go. In C++ we commonly practiced the so-called scissors rule, which put all the instance variables at the bottom. The common convention in Java, however, is to put them all at the top of the class. I see no reason to follow any other convention. The important thing is for the instance variables to be declared in one well-known place. Everybody should know where to go to see the declarations.
Dependent Functions. If one function calls another, they should be vertically close, and the caller should be above the callee, if at all possible. This gives the program a natural flow. If the convention is followed reliably, readers will be able to trust that function definitions will follow shortly after their use.
Conceptual Affinity. Certain bits of code want to be near other bits. They have a certain conceptual affinity. The stronger that affinity, the less vertical distance there should be between them.
In general we want function call dependencies to point in the downward direction. That is, a function that is called should be below a function that does the calling. This creates a nice flow down the source code module from high level to low level. (This is the exact opposite of languages like Pascal, C, and C++ that enforce functions to be defined, or at least declared, before they are used)
We use horizontal white space to associate things that are strongly related and disassociate things that are more weakly related. Example:
private void measureLine(String line) {
lineCount++;
int lineSize = line.length();
totalChars += lineSize;
lineWidthHistogram.addLine(lineSize, lineCount);
recordWidestLine(lineSize);
}
Assignment statements have two distinct and major elements: the left side and the right side. The spaces make that separation obvious.
public class Example implements Base
{
private Socket socket;
private inputStream input;
protected long requestProgress;
public Expediter(Socket s,
inputStream input) {
this.socket = s;
this.input = input;
}
}
In modern languages this type of alignment is not useful. The alignment seems to emphasize the wrong things and leads my eye away from the true intent.
public class Example implements Base
{
private Socket socket;
private inputStream input;
protected longrequestProgress;
public Expediter(Socket s, inputStream input) {
this.socket = s;
this.input = input;
}
}
This is a better approach.
The indentation it's important because help us to have a visible hierarchy and well defined blocks.
Every programmer has his own favorite formatting rules, but if he works in a team, then the team rules.
A team of developers should agree upon a single formatting style, and then every member of that team should use that style. We want the software to have a consistent style. We don't want it to appear to have been written by a bunch of disagreeing individuals.
Hiding implementation is not just a matter of putting a layer of functions between the variables. Hiding implementation is about abstractions! A class does not simply push its variables out through getters and setters. Rather it exposes abstract interfaces that allow its users to manipulate the essence of the data, without having to know its implementation.
These two examples show the difference between objects and data structures. Objects hide their data behind abstractions and expose functions that operate on that data. Data structure expose their data and have no meaningful functions.
Procedural Shape
public class Square {
public Point topLeft;
public double side;
}
public class Rectangle {
public Point topLeft;
public double height;
public double width;
}
public class Circle {
public Point center;
public double radius;
}
public class Geometry {
public final double PI = 3.141592653589793;
public double area(Object shape) throws NoSuchShapeException {
if (shape instanceof Square) {
Square s = (Square)shape;
return s.side * s.side;
}
else if (shape instanceof Rectangle) { Rectangle r = (Rectangle)shape; return r.height * r.width;
}
else if (shape instanceof Circle) {
Circle c = (Circle)shape;
return PI * c.radius * c.radius;
}
throw new NoSuchShapeException();
}
}
Polymorphic Shape
public class Square implements Shape {
private Point topLeft;
private double side;
public double area() {
return side*side;
}
}
public class Rectangle implements Shape {
private Point topLeft;
private double height;
private double width;
public double area() {
return height * width;
}
}
public class Circle implements Shape {
private Point center;
private double radius;
public final double PI = 3.141592653589793;
public double area() {
return PI * radius * radius;
}
}
Again, we see the complimentary nature of these two definitions; they are virtual opposites! This exposes the fundamental dichotomy between objects and data structures:
Procedural code (code using data structures) makes it easy to add new functions without changing the existing data structures. OO code, on the other hand, makes it easy to add new classes without changing existing functions.
The complement is also true:
Procedural code makes it hard to add new data structures because all the functions must change. OO code makes it hard to add new functions because all the classes must change.
Mature programmers know that the idea that everything is an object is a myth. Sometimes you really do want simple data structures with procedures operating on them.
The Law of Demeter
There is a well-known heuristic called the Law of Demeter that says a module should not know about the innards of the objects it manipulates.
More precisely, the Law of Demeter says that a method f
of a class C
should only call the methods of these:
C
- An object created by
f
- An object passed as an argument to
f
- An object held in an instance variable of
C
The method should not invoke methods on objects that are returned by any of the allowed functions. In other words, talk to friends, not to strangers.
The quintessential form of a data structure is a class with public variables and no functions. This is sometimes called a data transfer object, or DTO. DTOs are very useful structures, especially when communicating with databases or parsing messages from sockets, and so on. They often become the first in a series of translation stages that convert raw data in a database into objects in the application code.
Many code bases are completely dominated by error handling. When I say dominated, I don't mean that error handling is all that they do. I mean that it is nearly impossible to see what the code does because of all of the scattered error handling. Error handling is important, but if it obscures logic, it's wrong.
Back in the distant past there were many languages that didn't have exceptions. In those languages the techniques for handling and reporting errors were limited. You either set an error flag or returned an error code that the caller could check
In a way, try blocks are like transactions. Your catch has to leave your program in a consistent state, no matter what happens in the try. For this reason it is good practice to start with a try-catch-finally statement when you are writing code that could throw exceptions. This helps you define what the user of that code should expect, no matter what goes wrong with the code that is executed in the try.
Each exception that you throw should provide enough context to determine the source and location of an error.
Create informative error messages and pass them along with your exceptions. Mention the operation that failed and the type of failure. If you are logging in your application, pass along enough information to be able to log the error in your catch.
If you are tempted to return null from a method, consider throwing an exception or returning a SPECIAL CASE object instead. If you are calling a null-returning method from a third-party API, consider wrapping that method with a method that either throws an exception or returns a special case object.
Returning null from methods is bad, but passing null into methods is worse. Unless you are working with an API which expects you to pass null, you should avoid passing null in your code whenever possible.
We seldom control all the software in our systems. Sometimes we buy third-party pack- ages or use open source. Other times we depend on teams in our own company to produce components or subsystems for us. Somehow we must cleanly integrate this foreign code with our own.
There is a natural tension between the provider of an interface and the user of an interface. Providers of third-party packages and frameworks strive for broad applicability so they can work in many environments and appeal to a wide audience. Users, on the other hand, want an interface that is focused on their particular needs. This tension can cause problems at the boundaries of our systems. Example:
Map sensors = new HashMap();
Sensor s = (Sensor)sensors.get(sensorId);
VS
public class Sensors {
private Map sensors = new HashMap();
public Sensor getById(String id) {
return (Sensor) sensors.get(id);
}
//snip
}
The first code exposes the casting in the Map, while the second is able to evolve with very little impact on the rest of the application. The casting and type management is handled inside the Sensors class.
The interface is also tailored and constrained to meet the needs of the application. It results in code that is easier to understand and harder to misuse. The Sensors class can enforce design and business rules.
Third-party code helps us get more functionality delivered in less time. Where do we start when we want to utilize some third-party package? It’s not our job to test the third-party code, but it may be in our best interest to write tests for the third-party code we use.
It's a good idea write some test for learn and understand how to use a third-party code. Newkirk calls such tests learning tests.
The learning tests end up costing nothing. We had to learn the API anyway, and writing those tests was an easy and isolated way to get that knowledge. The learning tests were precise experiments that helped increase our understanding.
Not only are learning tests free, they have a positive return on investment. When there are new releases of the third-party package, we run the learning tests to see whether there are behavioral differences.
Some times it's necessary work in a module that will be connected to another module under develop, and we have no idea about how to send the information, because the API had not been designed yet. In this cases it's recommendable create an interface for encapsulate the communication with the pending module. In this way we maintain the control over our module and we can test although the second module isn't available yet.
Interesting things happen at boundaries. Change is one of those things. Good software designs accommodate change without huge investments and rework. When we use code that is out of our control, special care must be taken to protect our investment and make sure future change is not too costly.
Test Driven Development
- First Law You may not write production code until you have written a failing unit test.
- Second Law You may not write more of a unit tests than is sufficient to fail, and not comipling is failing.
- Third Law You may not write more production code than is sufficient to pass the currently failing tests.
If you don't keep your tests clean, you will lose them.
The readability it's very important to keep clean your tests.
It's recomendable maintain only one asserts per tests, because this helps to maintain each tests easy to understand.
This rule will help you to keep short functions.
- Write one test per each concept that you need to verify
- Fast Test should be fast.
- Independient Test should not depend on each other.
- Repeatable Test Should be repeatable in any environment.
- Self-Validating Test should have a boolean output. either they pass or fail.
- Timely Unit tests should be written just before the production code that makes them pass. If you write tests after the production code, then you may find the production code to be hard to test.
We like to keep our variables and utility functions small, but we're not fanatic about it. Sometimes we need to make a variable or utility function protected so that it can be accessed by a test.
- First Rule: Classes should be small
- Second Rule: Classes should be smaller than the first rule
Classes should have one responsibility - one reason to change
SRP is one of the more important concept in OO design. It's also one of the simple concepts to understand and adhere to.
Classes Should have a small number of instance variables. Each of the methods of a class should manipulate one or more of those variables. In general the more variables a method manipulates the more cohesive that method is to its class. A class in which each variable is used by each method is maximally cohesive.
Just the act of breaking large functions into smaller functions causes a proliferation of classes.
For most systems, change is continual. Every change subjects us to the risk that the remainder of the system no longer works as intended. In a clean system we organize our classes so as to reduce the risk of change.
Needs will change, therefore code will change. We learned in OO 101 that there are concrete classes, which contain implementation details (code), and abstract classes, which represent concepts only. A client class depending upon concrete details is at risk when those details change. We can introduce intefaces and abstract classes to help isolate the impact of those details.
Software Systems should separate the startup process, when the application objects are constructed and the dependencies are "wired" together, from the runtime logic that takes over after startup
One way to separate construction from use is simply to move all aspects of construction to main
, or modules called by main
, and to design the rest of the system assuming that all objects have been created constructed and wired up appropriately.
The Abstract Factory Pattern is an option for this kind of approach.
A powerful mechanism for separating construction from use is Dependency Injection (DI), the application of Inversion of control (IoC) to dependency management. Inversion of control moves secondary responsibilities from an object to other objects that are dedicated to the purpose, thereby supporting the Single Responsibility Principle. In context of dependency management, an object should not take responsibility for instantiating dependencies itself. Instead, it, should pass this responsibility to another "authoritative" mechanism, thereby inverting the control. Because setup is a global concern, this authoritative mechanism will usually be either the "main" routine or a special-purpose container.
According to Kent Beck, a design is "simple" if it follows these rules
- Run all tests
- Contains no duplication
- Expresses the intent of programmers
- Minimizes the number of classes and methods
Concurrence is a decoupling strategy. It helps us decouple what gets fone from when it gets done. In single-threaded applications what and when are so strongly coupled that the state of the entire application can often be determined by looking at the stack backtrace. A programmer who debugs such a system can set a breakpoint, or a sequence of breakpoints, and know the state of the system by which breakpoints are hit.
Decoupling what from when can dramatically improve both the throughput and structures of an application. From a structural point of view the application looks like many lit- tle collaborating computers rather than one big main loop. This can make the system easier to understand and offers some powerful ways to separate concerns.
- Concurrency always improves performance. Concurrency can sometimes improve performance, but only when there is a lot of wait time that can be shared between multiple threads or multiple processors. Neither situ- ation is trivial.
- Design does not change when writing concurrent programs. In fact, the design of a concurrent algorithm can be remarkably different from the design of a single-threaded system. The decoupling of what from when usually has a huge effect on the structure of the system.
- Understanding concurrency issues is not important when working with a container such as a Web or EJB container. In fact, you’d better know just what your container is doing and how to guard against the issues of concurrent update and deadlock described later in this chapter. Here are a few more balanced sound bites regarding writing concurrent software:
- Concurrency incurs some overhead, both in performance as well as writing additional code.
- Correct concurrency is complex, even for simple problems.
- Concurrency bugs aren’t usually repeatable, so they are often ignored as one-offs instead of the true defects they are.
- Concurrency often requires a fundamental change in design strategy.
A reference of code smells from Martin Fowler's Refactoring and Robert C Martin's Clean Code.
While clean code comes from discipline and not a list or value system, here is a starting point.
Reserve comments for technical notes referring to code and design.
Update or delete obsolete comments.
A redundant comment describes something able to sufficiently describe itself.
Comments should be brief, concise, correctly spelled.
Ghost code. Delete it.
Builds should require one command to check out and one command to run.
Tests should be run with one button click through an IDE, or else with one command.
Functions should have no arguments, then one, then two, then three. No more than three.
Arguments are inputs, not outputs. If somethings state must be changed, let it be the state of the called object.
Eliminate boolean arguments.
Discard uncalled methods. This is dead code.
Minimize the number of languages in a source file. Ideally, only one.
The result of a function or class should not be a surprise.
Write tests for every boundary condition.
Overriding safeties and exerting manual control leads to code melt down.
Practice abstraction on duplicate code. Replace repetitive functions with polymorphism.
Make sure abstracted code is separated into different containers.
Practice modularity.
Do a lot with a little. Limit the amount of things going on in a class or functions.
Delete unexecuted code.
Define variables and functions close to where they are called.
Choose a convention, and follow it. Remember no surprises.
Dead code.
Favor code that is clear, rather than convenient. Do not group code that favors mental mapping over clearness.
Methods of one class should not be interested with the methods of another class.
Do not flaunt false arguments at the end of functions.
Code should not be magic or obscure.
Use clear function name as waypoints for where to place your code.
Make your functions nonstatic.
Make explanatory variables, and lots of them.
...
Understand how a function works. Passing tests is not enough. Refactoring a function can lead to a better understanding of it.
Understand what your code is doing.
Avoid the brute force of switch/case.
It doesn't matter what your teams convention is. Just that you have on and everyone follows it.
Stop spelling out numbers.
Don't be lazy. Think of possible results, then cover and test them.
Design decisions should have a structure rather than a dogma.
Make your conditionals more precise.
Negative conditionals take more brain power to understand than a positive.
G31: Hidden Temporal Couplings
Use arguments that make temporal coupling explicit.
Your code's structure should communicate the reason for its structure.
Avoid leaking +1's and -1's into your code.
The toughest heuristic to follow. One level of abstraction below the function's described operation can help clarify your code.
High level constants are easy to change.
Write shy code. Modules should only know about their neighbors, not their neighbor's neighbors.
Choose names that are descriptive and relevant.
Think of names that are still clear to the user when used in different programs.
Use names that express their task.
Favor clearness over curtness. A long, expressive name is better than a short, dull one.
A name's length should relate to its scope.
Do not encode names with type or scope information.
Consider the side-effects of your function, and include that in its name.
Test everything that can possibly break
Use your IDE as a coverage tool.
...
If your test is ignored, the code is brought into question.
The middle is usually covered. Remember the boundaries.
Bugs are rarely alone. When you find one, look nearby for another.
Test cases ordered well will reveal patterns of failure.
Similarly, look at the code that is or is not passed in a failure.
Slow tests won't get run.