Saturday, March 22, 2014

Regular Expressions : Basics

Regular expressions (Regexp)is one of the advanced concept we require to write efficient shell scripts and for effective system administration. Basically regular expressions are divided in to 3 types for better understanding.



1)Basic Regular expressions

2)Interval Regular expressions (Use option -E for grep and -r for sed)

3)Extended Regular expressions (Use option -E for grep and -r for sed)

Some FAQ’s before starting Regular expressions

What is a Regular expression?

A regular expression is a concept of matching a pattern in a given string.

Which commands/programming languages support regular expressions?
vi, tr, rename, grep, sed, awk, perl, python etc.

Basic Regular Expressions
Basic regular expressions: This set includes very basic set of regular expressions which do not require any options to execute. This set of regular expressions are developed long time back.



^ –Caret/Power symbol to match a starting at the beginning of line.

$ –To match end of the line

* –0 or more occurrence of previous character.

. –To match any character

[] –Range of character

[^char] –negate of occurrence of a character set

<word> –Actual word finding

–Escape character

Lets start with our Regexp with examples, so that we can understand it better.

^ Regular Expression
Example 1: Find all the files in a given directory

ls -l | grep ^-

As you are aware that the first character in ls -l output, - is for regular files and d for directories in a given folder. Let us see what ^- indicates. The ^ symbol is for matching line starting, ^- indicates what ever lines starts with -, just display them. Which indicates a regular file in Linux/Unix.

If we want to find all the directories in a folder use grep ^d option along ls -l as shown below

ls -l | grep ^d

How about character files and block files?

ls -l | grep ^c

ls -l | grep ^b

We can even find the lines which are commented using ^ operator with below example

grep ‘^#’ filename

How about finding lines in a file which starts with ‘abc’

grep ‘^abc’ filename

We can have number of examples with this ^ option.

$ Regular Expression
Example 2: Match all the files which ends with sh

ls -l | grep sh$

As $ indicates end of the line, the above command will list all the files whose names end with sh.

how about finding lines in a file which ends with dead

grep ‘dead$’ filename

How about finding empty lines in a file?

grep ‘^$’ filename

 * Regular Expression
Example 3: Match all files which have a word twt, twet, tweet etc in the file name.

ls -l | grep ‘twe*t’

How about searching for apple word which was spelled wrong in a given file where apple is misspelled as ale, aple, appple, apppple, apppppple etc. To find all patterns


grep ‘ap*le’ filename

Readers should observe that the above pattern will match even ale word as * indicates 0 or more of previous character occurrence.

. Regular Expression
Example 4: Filter a file which contains any single character between t and t in a file name.

ls -l | grep ‘t.t’

Here . will match any single character. It can match tat, t3t, t.t, t&t etc any single character between t and t letters.

How about finding all the file names which starts with a and end with x using regular expressions?

ls -l | grep ‘a.*x’

The above .* indicates any number of characters

Note: .* in this combination . indicates any character and it repeated(*) 0 or more number of times.
Suppose you have files as..
awx
awex
aweex
awasdfx
a35dfetrx
etc.. it will find all the files/folders which start with a and ends with x in our example.

[] Square braces/Brackets Regular Expression
Example 5: Find all the files which contains a number in the file name between a and x

ls -l | grep ‘a[0-9]x’

This will find all the files which is
a0xsdf
asda1xsdfas
..
..
asdfdsara9xsdf
etc.

So where ever it finds a number it will try to match that number.

Some of the range operator examples for  you.

[a-z] –Match’s any single char between a to z.
[A-Z] –Match’s any single char between a to z.
[0-9] –Match’s any single char between 0 to 9.
[a-zA-Z0-9] – Match’s any single character either a to z or A to Z or 0 to 9
[!@#$%^] — Match’s any ! or @ or # or $ or % or ^ character.
You just have to think what you want match and keep those character in the braces/Brackets.

[^char] Regular Expression
Example6: Match all the file names except a or b or c in its filenames

ls | grep  ’[^abc]‘

This will give output all the file names except files which contain a or b or c.

<word> Regular expression
Example7: Search for a word abc, for example I should not get abcxyz or readabc in my output.

grep ‘<abc>’ filename

Escape Regular Expression
Example 8: Find files which contain [ in its name, as [ is a special charter we have to escape it

grep "[" filename

or

grep '[[]‘ filename

Note: If you observe [] is used to negate the meaning of [ regular expressions, so if you want to find any specail char keep them in [] so that it will not be treated as special char.

Note: No need to use -E to use these regular expressions with grep. We have egrep and fgrep which are equal to “grep -E”. I suggest you just concentrate on grep to complete your work, don’t go for other commands if grep is there to resolve your issues

Original article : http://www.linuxnix.com/2011/07/regular-expressions-linux-i.html
read more
http://www.linuxnix.com/2011/08/grep-command-regular-expressions-examples-ii.html
http://www.linuxnix.com/2011/08/grep-command-regular-expressions-examples-iii.html

No comments:

Post a Comment