Wednesday, September 24, 2008

Changing Good Programmers into Great

Not everybody is cut out to be a programmer. But for those who are, there is no reason you, as a manager or executive, can't help them move from just good to great, amazing and even awesome. Jeff Cogswell shows you how.

During the past two decades, I've worked with some really great programmers and software developers. And, unfortunately, I've worked with more than a few who probably should have chosen a different field. But the vast majority of the programmers fell somewhere in the middle. They were good. Not amazing, but definitely not bad either.

For managers and executives who have programmers and software developers reporting to them, the variation in skill can present quite a problem when you're trying to build a great product. How can you transform the good programmers into fantastic, amazing, awesome programmers?

Believe it or not, you can. Let's see how to do it.

First, you need to make sure your programmers have the essential skills, the fundamentals. Some do; some don't. (Just because they survived an undergrad program in computer science doesn't meant they do.)

Now this is going to sound obvious, but at the very least, every software developer must be a master of writing good lines of code. You've seen those who aren't, the programmers who sit there for hours, staring at 10 lines of code, trying to figure out what's wrong and can't. This kind of thing can happen to all of us programmers occasionally. But the problem is the programmer who does that on a regular basis.

I've worked with these programmers and you've probably had some working for you. They would come to me all the time, interrupting my work, and drag me to their cube to debug their code.

And this is going to sound rough, but the reality is some people just aren't cut out for programming. I'm talking about a very small percentage of people here, fortunately. But they're out there. If you have such a programmer on your staff, it might be time for a meeting with HR and a talk about other opportunities, perhaps in sales, customer support, testing (QA) or some other area of the company. He or she may excel in these areas. But you probably don't want him or her dragging the whole team down.

Fortunately, that's just a small percentage. Let's talk about the huge population that are in the middle, those who are good but not amazing. These are the ones you can help.
In fact, many of them are future experts but are, right now, just younger and less experienced. Such people don't always know about all the issues that can arise in software development. This isn't a problem with their ability; it's really just a problem of inexperience, something they'll overcome with time.

Probably the single biggest issue that younger programmers overlook is the hidden complexity in today's software systems. This is true especially for today's Web-based systems that can serve multiple Web users simultaneously.

In the old days, we would run what was called "stress testing" on our desktop applications. This involved running a program that would put our computer into a low-memory, low-disk-space state, allowing us to see whether our software could function. But with today's multiuser Web sites, the biggest problems aren't so much stress on memory and disk space, since typically the software will be running on large servers with a team of IT people making sure there's plenty of both. Instead, today the problems come more from multiple users trying to do the same thing simultaneously. And that's where the less experienced programmers might fall short in their coding.

Here's an example: Suppose your team is developing an ASP.NET application that will be storing data in an XML file. Ask your team what it takes to write data to the file. If they’re inexperienced, they might express the answer very simply, as in:
· You open the file.
· You write to it.
· You close the file.
Or, you ask them how to read a file:
· You open the file.
· You read the data you need.
· You close the file.

Seems simple and straightforward enough. But it's not. There are actually far more complex issues that can come up, issues that experienced programmers are well aware of but less experienced programmers might overlook, causing major problems when the software is running in a production environment. For example, what if two people are visiting the site simultaneously? Both are entering data into a Web form that needs to be saved. Your server is handling both people at the same time. Remember, the servers can run multiple "threads" at once (that is, the program is running the same parts of the code simultaneously). A separate thread is used to handle each user.

And that's where things get messy. The programmers might have written the code to open the file, read the whole thing into memory and close the file. Then the program would add on the user's new data to the data in memory, and write the whole thing back to the file, effectively replacing the entire file. This is common practice and it works well.
The problem is that if there are two users accessing the system, both threads might open the file, read the data in and close it at roughly the same time. Then simultaneously each thread might modify its own private version of the data. The first thread will write the data to the file and close it. Then the second thread will do the same, perhaps a tiny moment later, overwriting the first thread's version, losing the first user's data.

Or, one thread might open the file for writing, and then the second thread might try to do the same but not be able to (because the operating system locked the file when the first thread opened it), and this thread might not handle the situation appropriately and could crash the whole site, causing error messages to show up in the browsers of all the people visiting the site.

I've seen this kind of thing happen many times. And that's when we programmers get a phone call at 3 in the morning because the operations team couldn't get the software up and running again. And then we have to either connect remotely or drag our butts into the office in the middle of the night, load up on caffeine and track down the problem.

And then we find exactly what the problem is and how to fix it. In our example in particular, it turns out the programmer would have been better off using a set of classes built into the .NET framework that allow for read and write locks on files. These classes are easy to use and take only a couple lines of code. Had the programmer used these, the problem wouldn't have occurred.

As a programmer, I remember seeing such mess-ups in code and complaining to others in the company about it. One tech writer friend of mine laughed and said, "Oh, you guys each have your own way of doing things, and neither is better than the other."

Oh, really? Well there's a good litmus test for determining if the code is right: Does it crash?
Good software doesn't crash. Good software doesn't cause phone calls in the middle of the night where panicked people have to try and figure out why the software crashed.

I've expressed this litmus test before to others, but was met with severe resistance from other programmers. People don't like criticism. But the fact is, perfect software doesn't crash. The reality is that with today's massive systems it's nearly impossible to get every single bug out. But it's certainly within reason to get as many as bugs as possible out, minimizing crashes as much as possible and not using the excuse that "Bugs are inevitable and we should live with them."

And writing code for a Web server that crashes when two users connect to it simultaneously is unacceptable.

Handling things correctly, a manager can teach his or her team to not allow such bugs in the first place, and can oversee the process to prevent such bugs. How can this be done?
First, the team (and the QA folks) must do their job in testing. It's easy to run through a test and see that the program works fine when only one user is accessing the software; it's also easy for you, as the manager, to see that it's working wonderfully and to feel good about it. But it's not so easy to run a real stress test where hundreds or even thousands of threads are running simultaneously, all trying to access and manipulate the data. That's when you'll discover the real problems, the kind that can bring a system to its knees.

To run these kinds of tests requires that you have a QA team of testers who know their tools and know how to simulate such conditions. And further, it's important that the coders are aware of the issues so that by the time their code gets to the QA team, it's already set up to handle high-load situations.

That brings me to the second point: The developers must be trained in how to write code that handles such situations correctly so the system doesn't crash. I said that some bugs will creep in, and as much as I don't want to live with that situation, I suppose I accept it as fact. (And your programmers, by the way, should have a similar attitude, rather than just shrugging and saying bugs are normal. Bugs are unacceptable, and we must stop as many as possible, but occasionally we have to accept that a couple might slip through.)

Thus, at a minimum the programmers must be aware of what can go wrong, and must know how to write code that handles those situations correctly. And that means writing code that is "thread-safe" and is scalable (meaning it can run not only on a single-user basis, but easily and efficiently when hundreds or thousands of people are using it simultaneously, and even when divided up onto multiple servers).

So how do you help the good programmers grow into superior programmers that can write such code?

Early on I somehow stumbled upon something that saved my career many times. I realized that I couldn't possibly know everything. Instead, I realized that a good programmer knows where to quickly find the answers.

Often programmers would come to me for help. And more times than not, I'd say, "Give me 10 minutes and I'll have the answer." Then I'd go back to my cube, quickly look up the answer, and then return. What was I doing? I was going through the same references (Web sites, books, online help) that I'd been through so many times before and finding the answer quickly. So rather than just give up and call someone else for help, I would find the answer myself. Of course, each time I learned the answer, I'd try to remember it, at least in general, so that if it came up again I would either know it or find the answer even more quickly.

Consider the earlier threading example. I mentioned it's on an ASP.NET platform. Off the top of my head, from experience, I know there's a class that allows file locking for read and writes. I can't remember the exact name of the class, but I know that it involves locking and reads and writes. And I know where the standard docs are: the MSDN online documentation or, better yet, the local copy that ships with Visual Studio, the Combined Help Collection. Or, better still, if I remember when I wrote the code before, I could just look at how I did it before. And that means I can immediately locate the name of the class when I need it.

Of course, some really confident programmers want to "roll their own" and build their own locking mechanism, for example, and skip the built-in classes. This could happen for a couple of different reasons. First, the programmers might not even know that there's an alternative to rolling their own. How could they know that there's a handy class built right into the .NET framework that handles the read and write locks? The key is using what I learned so long ago, and knowing the resources and taking a few moments to look through them before rolling your own solution. And that's where you, the manager, can help: You can require that your programmers go through the online docs and find whether the solution already exists.

But the other reason a programmer might want to roll his or her own is because he or she might think the pre-built one isn't good enough. Now remember, I'm not talking about entire systems here that are already built. I'm talking about small, individual functions and classes, the nuts and bolts of your system, such as the file locking mechanism. Remember, programmers like to build things. It's their nature. And they feel especially good if they can build something that was better than the previous one.

But also remember: The class in this case is already built, and takes just a couple of lines of code to use. And it's already been through testing at Microsoft and has been used by thousands of other programmers successfully. You know it works.

Also, programmers have a tendency (myself included) to want to add all sorts of extra features to really make something cool. For example, a file locking mechanism would be even more useful if it included built-in file caching and a queue to manage the locks, and went far and beyond the little one in the library.

But that's overkill. And the last thing you want is for your programmers to spend two weeks, a week or two days writing code when all they need to do is write the one or two lines to make use of the class Microsoft gave us (or whoever built the library you're using for your particular platform). Besides, remember that even though the programmer might be able to roll out his or her own version in a day, your testers will have to now test that code in addition, and what was a day of work could turn into a week or two weeks. Compare that with using one or two lines of code that call a pre-existing, tested class. Which, then, I ask is better? Which is the right way to do it?

Of course, there may be times where the built-in class doesn't do everything you need. In that case, you need to carefully weigh your options and tradeoffs. Is there a way to make use of the class, just without all the extra features you were hoping for? Or is there a way to build a new class that expands on the existing class? (That's usually your best option.) Only if not should you consider having your team writing their own class. But you'll want to make sure you've exhausted your options before going that route. The last thing you want is to find out six months down the road that the thousand lines of code somebody wrote are barely functioning right, and it turns out there was a pre-existing class that did exactly what you needed and would have required three lines of code on the programmer's part.

Conclusion
The moral here, then, is to make sure your programmers are familiar with the information resources, especially the online documents, as well as any existing libraries and frameworks they might have access to that have been tested many times over. Then you need to make sure that they're not rolling their own classes and components when one already exists that does the job. Finally, they need to be aware of the real issues that come up in a multiuser, high-performance system such as a Web server handling thousands or even millions of sessions a day.

Subconscious Mind!

What if I told you that there was a part of your mind that is always working, even when you are asleep? This part of your mind is known as...