Jun 11
Writing and Regular Expressions
Just write. That’s what the blog post told me to do to get better at writing so that’s what I’m doing. I’m writing whatever comes into my head and I’m giving myself permission to fail. Permission to write junk that won’t be useful for anything but fodder for the delete button. I keep reminding myself I don’t want to be that guy who lets his blog go stale. I started this site to have an outlet and to share in a creative way my thoughts and musings as well as technical know-how.
So here goes. I recently worked on a regular expression to eliminate certain order lines in a label creation program and I thought I’d share it with you. I wanted to eliminate labels where the description was like:
"DOOR ! SOME COMMENT HERE ! "
So I came up with this regular expression to match against these comments and get rid of them.
"/^DOOR[\s]+[!]{1}[^!]*[!]{1}[\s]+$/"
I will break this one down so you can see what it does.
The “/” and “/” surrounding the pattern are what is known as the pattern delimiters, they specify the beginning and end of the pattern. The “^” at the beginning and the “$” at the end are known as anchors, they specify that what is found in the expression must be found starting with the first character in the subject text and ending with the last character of the subject text respectively.
The beginning of the pattern “DOOR[\s]+” indicates I want to find the word “DOOR” followed by one or more whitespace characters (spaces). The “\s” within brackets is shorthand for whitespace characters and captures also newlines. The “+” is the regex element that indicates that we’re looking for “one or more” of the preceding pattern.
The exclamation point inside the square brackets continues the pattern I’m looking to match and the number one inside curly braces indicates the number of them I want to find. So “[!]{1}” means find exactly one exclamation point.
Next the sub-expression “[^!]*” indicates any characters that are not an exclamation point (The caret inside of square brackets matches everything that is NOT the following characters) and “*” is a wildcard character meaning “matches zero or more times” so that “[^!]” means we must have an indeterminate number of non-exclamation point characters following the one exclamation point. Then following that there is another “[!]{1}” for that latter exclamation point, and another [\s]+ for one or more whitespace characters at the end of the subject text.
So that attempting to read this regular expression in English would give you something like this:
At the beginning of our subject text (^) look for the word DOOR (DOOR) followed by whitespace characters ([\s]) of which there will be one or more (+), followed by an exclamation point ([!]), just one! ({1}) followed by a non exclamation point character ([^!]) zero or more times (*), followed by an exclamation point ([!]), just one! ({1}), followed by a whitespace character ([\s]), one or more of them (+), up to the end of the subject text ($).
The tedious tasks are not always the low hanging fruit. The reason tedious tasks persist is that the water is muddy. Our vision is blurry. Our view is cloudy. However the underlying truths in solving all tasks still remain. Namely that seeing the real problems, finding discreet steps, exploring unconventional options, and persistence in breaking your own mental barriers is required.
