April 15, 2015

Always Valid

(One from the archives. Not sure when I wrote this. I think it still stands.)

These days, with the rise in programmer testing (also known as unit testing to its friends) much testing effort is put into ensuring that program logic is correct. This is good and to be encouraged at all times. A program that does the right thing is always more likely to be correct than one that does not. Interestingly, with all of this new-found enthusiasm for testing, there seems to be a few gaps in our approach. The one I'd like to discuss here is the principle of "Always Valid".

This principle is easy to arrive at from first principles. You see, programs do stuff. Every programmer knows this. We know it because we're the ones that tell them how to do it. But pause a moment and consider what this "stuff" is most of the time. The vast majority of the time we're processing data; and it's the data that is the star of the show. Our programs are just descriptions of what we want to do to that data. The data is the important thing.

Given that we care deeply about the data, we should be careful to ensure its quality at all times. We need to not only care about it when we're processing it, but both before and after. At any time that data is in the gentle "hands" of one of our programs, we need to treat it with the respect that it deserves.

If you ever think that data isn't that important, ask some of your business users if they would rather lose their invoice file or the program that processes it. I bet they'd take the approach that something could be hacked together to process the file if they ever lost the program, but there's no way to recreate invoice data without a backup. So, data is always more important than mere code.

How do we treat data with respect? If the stuff is so vitally important, how does this knowledge affect our behaviour? We respect data by ensuring that it's always valid. Period. It's as simple as that. Well, it's as simple as that, to say. Naturally there are some technical details that we need to take care of on the way to fulfilling the "always valid" principle, but these details are not that complicated and bring about a massive improvement in data quality for quite a small amount of effort.

At All Times

The zeroeth step, more of a guiding principle really, is the determination that your value objects will never be in an invalid state. This directive guides everything else that you'll read here. An object may be in only two states. It can be null, i.e. it doesn't exist and therefore has no data to worry about, or it can be instantiated and be in a valid internal state (however that is defined for the value object you're working with). Nothing else is acceptable if we are to achieve the level of data quality we've already outlined.

This may seem like an overly strict restraint, but consider that in these days of multi-threaded applications, it's possible that another thread is accessing your object when you don't suspect it. So, if you have an object that you're initializing and you have any intermediate steps, that other thread may access your object before you have it finished. This will almost certainly cause data quality issues, perhaps even data corruption.

Understand your data

The first step to data quality is to understand the data. It is especially important to understand the relationship between the different attributes. For example, when dealing with a rectangle object, it's likely to have two attributes and they're likely to be length and width and they're both vital to understand that rectangle.

Another example might be a person object. Now, I know that people objects can be complex, but let's say that we have a simple one and that we're just describing people we know. Attributes like gender and name are going to be important, but under most circumstances, it is not necessary to know someone's birthdate.

I can offer you a humorous personal example about birthdates. Ever since I was old enough to fill out forms, I have been listing my mother's age as "OVER21" because she never told anyone how old she was. Apparently, her age was unimportant for all of my interactions with government agencies, even when applying for clearance under the U.K. Official Secrets Act so that I could have an internship at a British Royal Navy establishment! (This used to be legal in England. I have no idea whether it still is, or whether listing your date of birth as "OVER21" is legal anywhere else in the world. And I am absolutely not a lawyer.)

Understanding our data, we can now implement that understanding in our value objects. If I'm creating a rectangle object, I'm going to want both the length and the width for my object to be valid. If I have one without the other, or even neither, then I do not have a valid rectangle.

Building our value objects

There are two ways to fulfil this data relationship requirement. The first, and my preferred approach, is to fully construct and initialize an object in its constructor. Our rectangle object needs two fields, so it has a constructor that takes values for those two fields and initializes the new instance, making it valid right from the start. Naturally, this would mean that there was no default constructor, unless the value object really did have no required fields, but I consider that unlikely.

An alternative method of creating valid objects is to use a factory. The reason that I don't recommend the factory approach is that the factory still needs a way to create the object and ensure that it's correct from the start, so it'll most likely need to use the appropriate constructor behind the scenes anyway. Cut out the middle-man and just create the object yourself!

Updating value objects

Updating a value object is an opportunity for making a value object containing invalid data. Consider implementing value objects as immutable objects. This eliminates the opportunity for them to become invalid during updates. New versions of the objects can always be created with the new data, so it's not a huge roadblock.

If you do decide to allow updates to your value objects, please consider an update method that will update everything that needs updating together all at the same time. In this way, you reduce the update to something approaching an atomic activity and reduce the danger of the object being accessed while it is internally inconsistent.

Validation

Lastly, it is vitally important to remember to validate data on the way in to an object and throw appropriate exceptions if the data isn't correct. Validate everything that you get in the arguments to the constructor. It's valid to throw exceptions in constructors if you receive bad data. The exception will prevent the instance being created and so protect the caller from receiving an invalid object.

Conclusion

By following the advice here, you will be on the way to the creation of high quality data value objects. These value objects will play a major role in your quest for data quality. Now, your programs will do the right things and do it with data that is right. What's not to like?

Tags: Software