Random coding advice: Avoid Duplication

A lot of code I see from new coders shows a thought process like this:

— I’ve got a bunch of matches the player can burn
— after a match is burned, it is moved to a container for used matches; that way I can count how many matches have been burned by writing “the number of matches in the used matched depository”.
— after writing that enough times, I get tired of the verbosity and want to be able just to write “the number of burned matches”. Okay, I’ll also give the matches a “burned” property when I use them up!

Now the burning code looks something like this:

 Instead of burning something with a match:
     [other stuff];
     move the second noun to the used match depository;
     now the second noun is burned.

and other places in the code uses both “number of matches in the used matches depository” and “number of burned matches” to refer to (what the author hopes is) the same information.

This is asking for trouble. Sooner or later you’ll move the object but not set the flag, or change the way the multi-process destruction procedure happens in one place and not in the other. As soon as you do, there will be bugs.

So here’s the advice:

Do not put the same information into your world model twice.

In general, I recommend picking the *simplest* way of expressing the information that still conveys everything you need to model. In this case, I’d drop the used matches depository entirely and just use the “burned” property.

But say I do (for some reason) need to keep my used matches in some container, and I also want a less-verbose way of referring to what’s in that container. Then the safe solution is to make sure that everything is being calculated from the same information (is the match in the container or not?). For instance, in this example, instead of giving the matches a burned property, we might make a definition that *calculates* whether the match is considered “burned” based on whether it is in the depository:

Definition: a match is burned if it is in the used matches depository.

This is all a special case of a more general rule: any time you’re coding along and think to yourself, “okay, from now on I’ve got to remember to do X, Y, and Z every time I do Foo,” you’re setting yourself up for misery. The computer isn’t nearly so error-prone as your memory, especially when you’re working at 1 AM before a deadline.

(P.S.: this isn’t directed at anyone in particular. I’ve had this conversation with quite a few people lately.)

6 thoughts on “Random coding advice: Avoid Duplication”

In regular computer science this would be the same advice: keep your data normalized (one and only one version of each item of data – every bit of derived data derived when needed). This goes for data in databases, data in memory, on disk, etc, etc.

It is essential advice, but not very efficient. Eventually most data models need to be denormalized for performance reasons.

Performance isn’t a huge issue in Inform 7, unless you’re doing something as complex as Alabaster, which presumably most of the folks you’re talking to aren’t.

Still, in big complex systems where you have to denormalize your data, keeping data integrity is a huge, painful and expensive job. If you can avoid it, do!

Emily Short says:

May 31, 2009 at 2:55 pm

Sure. Though the situation in Alabaster is helped a bit by the fact that most of the data I need to cache at least doesn’t change during the course of play.

I thought about adding a caveat about performance, but in practice what I find is that people often worry about performance too early, and it impedes them from getting a basic hold on how the system works. (And their intuitions about what will and won’t damage performance are often wrong anyway.)

Reply

Also referred to as “don’t repeat yourself” or DRY, http://en.wikipedia.org/wiki/DRY
Basic coding 101.

Emily Short says:

May 31, 2009 at 10:14 pm

Yah. I really, really was not thinking I’d discovered some new-to-computer-science discovery here. :)

Reply

I think coding just seems complicated to the new coder, so they figure extra things are necessary that aren’t. I’ve just started dabbling in I7, and I think I have the opposite problem of not putting in enough. :)

Never thought of that.. It makes good sense out of some professorial programming advice I got one time.

“Logic is a maze. And your brain is a mouse, desperately looking for cheeze.”

(That wasn’t the advice, just something someone said.)

Thanks, Emily — very helpful. I suppose part of the challenge is to dice up your information so your variables cover everything but don’t overlap.

Conrad.

Ian says:

May 31, 2009 at 2:40 pm

In regular computer science this would be the same advice: keep your data normalized (one and only one version of each item of data – every bit of derived data derived when needed). This goes for data in databases, data in memory, on disk, etc, etc.

It is essential advice, but not very efficient. Eventually most data models need to be denormalized for performance reasons.

Performance isn’t a huge issue in Inform 7, unless you’re doing something as complex as Alabaster, which presumably most of the folks you’re talking to aren’t.

Still, in big complex systems where you have to denormalize your data, keeping data integrity is a huge, painful and expensive job. If you can avoid it, do!

1. Emily Short says:
  
  May 31, 2009 at 2:55 pm
  
  Sure. Though the situation in Alabaster is helped a bit by the fact that most of the data I need to cache at least doesn’t change during the course of play.
  
  I thought about adding a caveat about performance, but in practice what I find is that people often worry about performance too early, and it impedes them from getting a basic hold on how the system works. (And their intuitions about what will and won’t damage performance are often wrong anyway.)
  
Dave Chapeskie says:

May 31, 2009 at 5:34 pm

Also referred to as “don’t repeat yourself” or DRY, http://en.wikipedia.org/wiki/DRY
Basic coding 101.

1. Emily Short says:
  
  May 31, 2009 at 10:14 pm
  
  Yah. I really, really was not thinking I’d discovered some new-to-computer-science discovery here. :)
  
Ben Carlsen says:

May 31, 2009 at 10:18 pm

I think coding just seems complicated to the new coder, so they figure extra things are necessary that aren’t. I’ve just started dabbling in I7, and I think I have the opposite problem of not putting in enough. :)

Conrad says:

June 3, 2009 at 2:31 pm

Never thought of that.. It makes good sense out of some professorial programming advice I got one time.

“Logic is a maze. And your brain is a mouse, desperately looking for cheeze.”

(That wasn’t the advice, just something someone said.)

Thanks, Emily — very helpful. I suppose part of the challenge is to dice up your information so your variables cover everything but don’t overlap.

Conrad.

Share this:

Related

6 thoughts on “Random coding advice: Avoid Duplication”

Leave a reply to Ian Cancel reply