The State of the Machine

I get cranky when I see people use regular expressions or simple substring routines when extracting strings from, for example, e-mail addresses. The first method, while powerful, is memory hungry, the second method is plain childish. You should only use substring/copy methods if you’re hundred percent certain that the data is formatted and well-formed (that is, it comes through exactly as you expect it to. In the case of e-mail addresses, this is of course, not true. After all, e-mail addresses can come in any format. The following samples are all legal: “hey@you.com”, “hey@you.com (Hey You)”, “ Hey You”, “Hey, You “. Your simple substring copy function would most likely have troubles resolving all of these e-mail variants.

During my Roundabout tenure, we ran into issues where extraction of names/e-mail from e-mail headers didn’t work out as originally planned. I was not surprised to find those evil substring routines in the code and literally rewrote that into a state machine (look for HeaderAddressToStringList). Extremely elegant and very effective.

Why use a state machine then? Because with string operations like this, looping through a string is a lot faster than trying hundreds of “if conditions” to cover all these e-mail cases. Keep in mind that simplicity is the key though: the more states you define, the complexer the code.

This entry was posted in RoundAbout and tagged , , . Bookmark the permalink.