Regular expressions (or regex) are a bit confusing. I read the chapter in the Rute User’s Tutorial on regex, but found that I didn’t quite get it. So I watched Nixie Pixel’s video on regex and re-read the chapter while testing out the commands. I think what was confusing me is that the different symbols change meaning depending on where you’re using them. It’s going to take a bit of practice using regex before I will remember which symbols mean what in which circumstances.
To practice, I followed Nixie’s example instead of the Rute User’s Tutorial’s. Instead of trying it out using one of my documents, I used the dictionary that came with Mint (/usr/share/dict/words). However, I tried out the commands listed in the Rute User’s Tutorial, as they had a lot more explanation. I then tried out some of my own.
Both Rute and Nixie only used regex with the grep command. I’m not sure if this is the only thing regex is used for in the terminal, or just the most common way to use it. Basically, the grep command finds strings in documents. The egrep command (or grep -E) extends grep to allow ors (this is or: |), so I can search for this|that. There’s also fgrep, which searches for strings but doesn’t use regular expressions, so you don’t have to escape (\) for special characters.
While having a reference in front of me that tells me what all the symbols are, using regex in grep isn’t that difficult. The command:
grep ‘b[a-f]*’ /usr/share/dict/words
will pull up words from my dictionary that contain a string that has the letter ‘b’ followed by any combination or number of the letters ‘a’ through ‘f’. In practice, this would search for all the words that contain the letter ‘b’, as any can mean none. So I get results like bad, zebra and visibly.
I can narrow my search by searching for whole words instead of strings. To do this, I add the argument -w. It will now search for words that contain only ‘b[a-f]*’ which would include beefed and bad, but not wombat. The command with the -w argument added looks like this:
grep -w ‘b[a-f]*’ /usr/share/dict/words
I can also search for a limited number of letters. If I wanted to find a four letter word that began with ‘j’, I would enter:
grep -w ‘j…’ /usr/share/dict/words
The period (.) is a wildcard that means exactly one letter. My results would include jive and jump, but not jammed or inject.
I can also search for strings that begin or end with certain letters. The carrot symbol (^) means begins with, the dollar sign ($) means ends with. If I wanted to search for four letter words that began with ‘j’ and ended with ‘t’ I would enter:
grep ‘^j..t$’ /usr/share/dict/words
For the results, I only get jest, jilt, jolt and just. I didn’t need to add the -w argument because I was searching for something very exact, with every letter defined.
I can also exclude letters from my search. When a carrot is inside brackets, it means not. So, the command:
grep ‘^j[^a-f]*t$’ /usr/share/dict/words
will give me results that start with ‘j’, contain any number of letter except ‘a’ through ‘f’ and ends with ‘t’. My results include jot, jolt and jumpsuit, but not jest. To me, this is the hardest part of regex: remembering when symbols have the different meanings.
There is one more type of search I wanted to share. The egrep command allows me to use or (|) in my searches. To search for words that begin with either ‘th’ or ‘sh’, are followed by any number of letters ‘a’ through ‘f’ and end with ‘t’, I would use the command:
egrep ‘^(th|sh)[a-f]*t$’ /usr/share/dict/words
My results would include shaft, that and theft, but not throat or shift.
This isn’t everything that was covered in the Rute User’s Tutorial, and it looks like there’s another section on regex later in the book. I can see a lot of potential uses for this: a scrabble helper as Nixie Pixel suggested, to look up employees in a database, to look up specific names of things if you can only remember part of what you’re looking for, etc. It might also be very helpful if I’m looking for typos or spelling errors in my own programming code. I don’t think this tool will be something I use on an everyday basis right now, but I think continuing to practice using regex will help me later.