UncategorizedNo Comments

default thumbnail

So you’re at a party, and there’s this programmer that you really want to impress. What do you do? OK, before you start hitting me with the question of why a programmer is even at a party (in case you haven’t heard, we don’t get out much), I’ll tell you the answer. All you have to say is, “Wow, I love RegEx” then just keep nodding in agreement as they launch off into a glorious tirade about how poorly underutilized and underappreciated and how powerful regular expressions are.

Yes, I will explain regular expressions, but first, I will put out a disclaimer that Regular Expressions may, on occasion, prove to be challenging, frustrating, and sometimes as confusing as trying to translate into Latin an entire bowl of alphabet soup, with added punctuation. With that said, they are very cool. They are rather handy. And they are even accessible through DriveWorks, not only via programming.

Regular Expressions, known amongst the cool kids (and me, too) as “RegEx,” are a way of searching a string to see if it matches a given pattern. This can be used to validate a string (“Is this a valid email address?” or “ Is this a phone number with dashes or dots and maybe parentheses?”) and can also be used for searching a string (“look in this string and find the number of feet and the number of inches, even though I don’t know if they used ft, in, ‘, “, -, …”). The Regular Expression is a way to represent a pattern to find in a string, or part of a string. You’ve, no doubt, searched for filenames using wildcards before, where (?) would represent a single character and (*) would represent any number of characters. You would search for (*.pdf) or (Door??-Knob*.slddrw). Well, if you can do that, then you can certainly, easily, understand why someone would take this 42 steps further to create regular expressions.

DriveWorks provides a function called =IsMatch() that will compare a string with a RegEx to see if there is a match. It returns a simple boolean (True/False). Taken to programming, RegEx can go further and can actually parse strings to collect information. So rather than asking, “does this match,” you can tell your program to “look in the string and find me all of the substrings that match this.” You can even specify which parts to capture. So, if you’re looking at a dimension, you can tell the RegEx to look for “ft”, “in” or “mm” or (‘) or (“) and take any number that’s before it. The RegEx can be used to find a match, but only return the part of the match that you want. And it can return a collection of all the matches that it finds.

Unlike our simple wildcards, with RegEx, you’re not limited to looking for a single character or multiple characters. You can specify what type of characters (number, lowercase letters, etc.) or even what specific letters or numbers or characters. For example, if you wanted to look at a string and check to see if it was a value 10-digit US phone number, you could say, “Are there three numbers, followed by a dash, followed by three more numbers, followed by another dash, followed by four more numbers?”. With DriveWorks, you could use the =IsMatch() function to check the string against a RegEx. So the rule =IsMatch(PhoneNumberReturn, “^[0-9]{3}-[0-9]{3}-[0-9]{4}$”)  would do the trick. Hey, I warned you that they don’t look easy.

But, don’t worry. There is a method to the apparent madness. Every character in the RegEx has a meaning. The (^) says to start at the beginning of the string. Without that carat, we could get a match with “Froglips999-111-2222”. Likewise, ($) represents the end of the word. We want a phone number and nothing else, no office extensions allowed. We can represent ranges of values ([0-9]). We can represent quantities, ([0-9]{3}) means three digits between zero and nine. And the (-) outside of any brackets and braces represents a literal dash.

But you wouldn’t be able to impress a programmer with that. No. RegEx goes further, a LOT further. The reality is that a US phone number could have a (+1) at the front. They might be graphic designers, and use (.) instead of (-) between their digits. They may have parentheses around their area code (first three numbers). And they just might have an office extension. Well, RegEx can handle all of that. There is RegEx syntax for “there might or might not be.” There is RegEx syntax for “one or more of the following characters.” There is RegEx syntax for words, white spaces, logical operators (“ab” OR “ba”), and even special characters like tabs and linefeeds. And there are lots more.

So how can you possibly learn and remember all of this RegEx syntax? Simple. You don’t. As with most programmatic tools, RegEx is very well documented on the web with a wide variety of RegEx builders and testers. Sites like www.RegExR.com and www.regextester.com have full references that can help you build and test your regular expressions. But even more likely, if you populate your favorite search engine with a description of the RegEx you’re looking for, odds are that someone else has built it.

So as with almost everything programming, the important part is understanding the concept. A Regular Expression allows you to validate a string, to see if it matches a certain pattern. A Regular Expression can also be used to parse a string to find text that matches a certain pattern. And just like everything else in programming, you can work your way through it with a little searching and some cool tools available on the web. As one of my professors once told me, “A good engineer knows nothing, but knows where to find everything.”

The post Irregular Expressions appeared first on Razorleaf.

Be the first to post a comment.

Add a comment