Searching with the Regex Class
As we discussed in the previous article, Regular Expressions allow you to define search patterns for working with strings. To process this search pattern, the .NET framework comes with a very versatile class: The Regex class. In this article, we will define some search patterns and use them with the Regex class, but please bear in mind that the syntax of Regular Expressions can be quite complicated and that this is a C# tutorial and not a Regex tutorial. Instead, I will use some simple Regex patterns to demonstrate how you work with them in C#. If you want to know more about Regular Expression, I can recommend this Regular Expression Tutorial.
The IsMatch() Method
In this first example, I'll use one of the most basic methods of the Regex class called IsMatch. It simply returns true or false, depending on whether there is one or several matches found in the test string:
string testString = "John Doe, 42 years";
Regex regex = new Regex("[0-9]+");
if (regex.IsMatch(testString))
Console.WriteLine("String contains numbers!");
else
Console.WriteLine("String does NOT contain numbers!");
We define a test string and then we create an instance of the Regex class. We pass in the actual Regular Expression as a string - in this case, the regex specifies that we're looking for a number of any length. We then output a line of text depending on whether the regex is a match for our test string. Pretty cool, but in most cases, you're looking to actually do something with the match(es) - for this, we have the Match class.
The Match Class & Method
In this next example, we'll capture the number found in the test string and present it to the user, instead of just verifying that it's there:
string testString = "John Doe, 42 years";
Regex regex = new Regex("[0-9]+");
Match match = regex.Match(testString);
if (match.Success)
Console.WriteLine("Number found: " + match.Value);
We use the same regex and test string as before. I call the Match() method, which will return an instance of the Match class - this will happen whether or not a match is actually found. To ensure that a match has been found, I check the Success property. Once I'm sure that a match has been found, I use the Value property to retrieve it.
The Match class contains more useful information than just the matched string - for instance, you can easily find out where the match was found, how long it is and so on:
string testString = "John Doe, 42 years";
Regex regex = new Regex("[0-9]+");
Match match = regex.Match(testString);
if (match.Success)
Console.WriteLine("Match found at index " + match.Index + ". Length: " + match.Length);
The Index and Length properties are used here to display information about the location and length of the match.
Capture Groups
In the first couple of examples, we have just found a single value in our search string, but Regular Expressions can, of course, do a lot more than that! For instance, we can find both the name and the age in our test string, while sorting out the irrelevant stuff like the command and the "years" text. Doing stuff like that is a piece of cake for Regular Expressions, but if you're not familiar with the syntax, it might seem very complicated, but let's give it a try anyway:
string testString = "John Doe, 42 years";
Regex regex = new Regex(@"^([^,]+),\s([0-9]+)");
Match match = regex.Match(testString);
if (match.Success)
Console.WriteLine("Name: " + match.Groups[1].Value + ". Age: " + match.Groups[2].Value);
I have modified the regex so that it looks for anything that is NOT a comma - this value is placed in the first capture group, thanks to the surrounding parentheses. Then it looks for the separating comma and after that, a number, which is placed in the second capture group (again, thanks to the surrounding parentheses). In the last line, I use the Groups property to access the matched groups. I use index 1 for the name and 2 for the age since it follows the order in which the match groups were defined in the regex string (index 0 contains the entire match).
Named Capture Groups
As soon as the regex becomes more advanced/longer than the one we just used, numbered capture groups might become unmanageable because you constantly have to remember the order and index of them. Fortunately for us, Regular Expressions and the .NET framework supports named capture groups, which will allow you to give each group a name in the regex and then reference it in the Groups property. Check out this re-written example, where we use named groups instead of numbered:
string testString = "John Doe, 42 years";
Regex regex = new Regex(@"^(?<name>[^,]+),\s(?<age>[0-9]+)");
Match match = regex.Match(testString);
if (match.Success)
Console.WriteLine("Name: " + match.Groups["name"].Value + ". Age: " + match.Groups["age"].Value);
It works exactly as it does before, but you can now use logical names to lookup the matched values instead of having to remember the correct index. This might not be a big difference in our simple example, but as mentioned you will definitely appreciate it when your Regular Expressions grows in complexity and length.
The MatchCollection Class
The Match class is the way to go if you only want to work with a single match (remember that a match can contain multiple values, as we saw in the previous examples), but sometimes you want to work with several matches at once. For this, we have the Matches() method which will return a MatchCollection class. It will contain all matched values, in the order in which they were found. Let's have a look at how it can be used:
string testString = "123-456-789-0";
Regex regex = new Regex(@"([0-9]+)");
MatchCollection matchCollection = regex.Matches(testString);
foreach (Match match in matchCollection)
Console.WriteLine("Number found at index " + match.Index + ": " + match.Value);
I have changed the regex and the test string, compared to the previous examples. We now have a test string which contains several numbers and a regex which specifically looks for strings consisting of one or more numbers. We use the Matches() method to get a MatchCollection from our regex, which contains the matches found in the string. In this case, there are four matches, which we output one after another with a foreach loop. The result will look something like this:
Number found at index 0: 123
Number found at index 4: 456
Number found at index 8: 789
Number found at index 12: 0
If no matches were found, an empty MatchCollection would have been returned.
Summary
With help from the Regex class, along with the Match and MatchCollection classes, we can easily do very advanced string matching. The Regular Expression syntax might seem very complex, but once you learn it, you will have a very strong tool. Even if you don't want to invest the time in learning the regex syntax, you can often find expressions for specific needs, created by other programmers, with a simple Google search. As soon as you have written or borrowed the regex string, you can use it for your own purpose with the techniques and classes demonstrated in this article.
But searching is only a part of the fun - you can also do some very cool search/replace operations with Regular Expressions. We will look into this in one of the next articles.