Make Your Code Readable!

One of the most important qualities of code is, or at least should be, readability. The simple fact that it works is not enough. If it’s not readable, it definitelly ain’t maintainable. Thus, our mission, as programmers, is to right code, that above being correct, is also readable and understandable. When a new programmer joins a team it takes time to familiarize with the application domain, language or code or all. To reduce that time, companies use code convension, that in theory should make the code readable, but unfortunatelly in reality, in many cases, does the oposite. 

A long time ago, in a galaxy far, far away there were no rich IDEs and no Intellisense. Programmers had to knew by memory all the types, functions, and variable names. Moreover they had to new the type of variables, parameters or returned values. Thus, naming conventions helped in taming the beast. Probably one of the most famous (and infamous at the same time) convention, at least for C and C++, is the so called Hungarian notation, invented almost 3 decades ago by Charles Simonyi, a chief architect at Microsoft. It was named so because it made code looked like written in a foreign language, and Simonyi was of Hungarian origins. Nowadays there are a lot of variations of the Hungarian notations, but in the beginning was one called Apps Hungarian, named after the group Simonyi was working in, called Applications, that was in charged with developing Word and Excel. Later, the group that developed the Windows barrowed the convention without understanding it, and developed a second version called System Hungarian. That was evangelized by Charles Petzold in his bestseller “Programming Windows”, probably the bible of windows programmers.

Though Simonyi had some good intentions and ideas, they were misinterpreted, and that led to a bad convention style. All because he used the term “type” when he actually meant “kind”. In a paper, written in an academic style, he wrote:

“The basic idea is to name all quantities by their type.”

Though he used to quote the word type, and tries to explain that a type is defined by a set of operators that apply to a quantity, people did not correctly undestand what he was saying. He used rgX to indicate an array of X or “range X”, cX, to count the instances of X, or dX for the difference between two instances of type X. Apps Hungarian was used for Excel and Word, where rowWidth could be an integer variable specifying the width of a row, and dX also an integer variable but specifying the number of selected colums. In his Apps Notation it didn’t make any sense for instance to add dX to rowWidth, though these were both integers, which has an operator + defined. They have the same type, but they are not of same kind. The original paper of Charles Simonyi was published in 1999 in MSDN: http://msdn2.microsoft.com/en-us/library/aa260976(VS.60).aspx

The Apps Hungarian focused on the semantic of quantities (variables, functions). In Systems Hungarian, however, the focus was moved on type. With p indicating pointer, ch characters, dw double words, ar arrays, sz, null-terminated strings, or lp far pointers, you could see variables named like lpszName (long pointer to null-terminted string specifing a name), dwI (double word index), arx (array of X), but also things like lpararszID (long pointer to an array of arrays of null terminated strings). Now imagine hundred of thousand on lines of code written line that. It simply makes my head ache. Such notations cannot make the code more readable. Only the oposite.

If you look up on the web, you’ll see a lot of notation conventions for the same language. Even for the Hungarian notations, there are plenty of variants. Here are just two that I picked-up randomly with a google search:
http://web.umr.edu/~cpp/common/hungarian.html
http://www.gregleg.com/oldHome/hungarian.html

I say enough with bad style programming! The days of such notations are long gone. Today Intellisense help you writing the code. When you use an object and type . or -> it automatically shows you the list of functions, variables, properties, etc. it has. When you write a function name it shows you the prototype, with all the overloads, parameters type and name, and even comments. And many other features that replace the need to prefix your variables with a long string of letters that you can barely read. Joel Spolsky wrote a very good article called Making Wrong Code Look Wrong, where among others he explains why the Hungarian notation fell into disgrace and was replaced at Microsoft with a new one with the release of the .NET framework.

I’m not going to delve into proposing good convention styles, there are plenty available. I just want to make some points: often, what you think it’s good for you, it proves the oposite in reality. Using bad naming convension is like deeling with a lot of birocrary: too many paper work, no real benefit. Thus I will restrain myself to just a few points:

  • variables (whether local, global, parameters, etc.) should be named in such a way that the focus is on the kind (the meaning of the variable) not the type. dwCnt doesn’t tell me anything, except that its a double word value, but countOfItems indicates that this variable is used to count items, regardless it’s a single or double word. That’s much more valuable for someone reading and trying to understand your code. When you have two variables dwCnt and dwNr, perhaps in a given context you may wonder why they are not summed together, but if they were called countOfItems and numberOfBuyers, you could immediatelly understand why.
  • context should not be added aritificially. Tim Ottinger makes a very good point on this matter in his article called Ottiger’s Rules for Variables and Class Naming. The company I work for uses a very bad naming convention that adds unnecessary context to variables. They use to prefix the names of types, functions, even variables with the name of the module. For instance if a module is called “Cluster Display Framework” (a name invented by me), they prefix everything with CDF: variable names, function, parameters, types, everything starts with the module name. Such context is totally unnecessary. Prefixing all your entities in a module with the module name does not add any kind of value to the code of that module, because you simply ignore that context. When you write in languages such C++, C#, Java, etc. you have the mechanism of namespaces available. You can put functions, classes, etc. that belong together inside a namespace, instead of prefixing everything.
  • unnecessary comments should be avoided. Don’t write comments just for the sake of writing. That doesn’t help. Polluting the code with redundant comments is unnecessary. If you have a variable of type string called nameOfUser, don’t add a comment to it, saying it specifies the user name. That is already known from the name of the variable. And that leads back to the first point I made above. Comments should be used to explain algorithms, or decision makings (why something is called in a particular context). As for the rest, the code should be self-commenting. That is a very good point made by Rober Fowler in his book “Refactoring”.

With all that being said, I just want to add that perhaps the best naming convension I know is the one use in the .NET framework (you can read about it here http://www.akadia.com/services/naming_conventions.html). I tend to use it lately even in C++, though C++ is not a language targeting the framework. It’s simple and meaningful. 

To conclude, make sure that when you write code, you do it in such a way that it doesn’t take a lot of time and head aches to understand it. Keep it simple. Keep it readable.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.