TOC

This article is currently in the process of being translated into Ukrainian (~19% done).

Regular Expressions (Regex):

Searching with the Regex Class

Як ми обговорили в попередній статті, регулярні вирази дозволяють визначати шаблони пошуку для роботи з рядками. Шоб обробити цей шаблон пошуку, .NET framework укоплектований дуже універсальним класом: класом Regex. У цій статті ми визначимо деякі шаблони пошуку і використаємо їх з класом Regex, але не забувайте, що синтаксис регулярних виразів може бути досить складним, і що це посібник по C#, а не по регулярних виразах. Натомість, я використаю деякі прості регулярні вирази, щоб продемонструвати, як ви можете працювати з ними у C#. Якщо ви хочете дізнатись більше про регулярні вирази, я можу порекомендувати цей Regular Expression Tutorial.

The IsMatch() Method

У цьому першому прикладі я використаю один з основних методів класу Regex, який називається IsMatch. Він просто повертає true або false залежно від того, чи є один або більше збігів у тестовому рядку:

string testString = "John Doe, 42 years";
Regex regex = new Regex("[0-9]+");
if (regex.IsMatch(testString))
    Console.WriteLine("String contains numbers!");
else
    Console.WriteLine("String does NOT contain numbers!");

Ми визначаємо тестовий рядок, а потім створюємо екземпляр класу Regex. Ми передаємо фактичний регулярний вираз як рядок - у цьому випадку він вказує, що ми шукаємо число будь-якої довжини. Далі ми виводимо рядок тексту в залежності від того, чи співпадає регулярний вираз з нашим тестовим рядком. Доволі круто, але у більшості випадків ви хочете щось зробити з цими збігами - для цього ми маємо клас Match.

The Match Class & Method

In this next example, we'll capture the number found in the test string and present it to the user, instead of just verifying that it's there:

string testString = "John Doe, 42 years";
Regex regex = new Regex("[0-9]+");
Match match = regex.Match(testString);
if (match.Success)
    Console.WriteLine("Number found: " + match.Value);

We use the same regex and test string as before. I call the Match() method, which will return an instance of the Match class - this will happen whether or not a match is actually found. To ensure that a match has been found, I check the Success property. Once I'm sure that a match has been found, I use the Value property to retrieve it.

The Match class contains more useful information than just the matched string - for instance, you can easily find out where the match was found, how long it is and so on:

string testString = "John Doe, 42 years";
Regex regex = new Regex("[0-9]+");
Match match = regex.Match(testString);
if (match.Success)
    Console.WriteLine("Match found at index " + match.Index + ". Length: " + match.Length);

The Index and Length properties are used here to display information about the location and length of the match.

Capture Groups

In the first couple of examples, we have just found a single value in our search string, but Regular Expressions can, of course, do a lot more than that! For instance, we can find both the name and the age in our test string, while sorting out the irrelevant stuff like the command and the "years" text. Doing stuff like that is a piece of cake for Regular Expressions, but if you're not familiar with the syntax, it might seem very complicated, but let's give it a try anyway:

string testString = "John Doe, 42 years";
Regex regex = new Regex(@"^([^,]+),\s([0-9]+)");
Match match = regex.Match(testString);
if (match.Success)
    Console.WriteLine("Name: " + match.Groups[1].Value + ". Age: " + match.Groups[2].Value);

I have modified the regex so that it looks for anything that is NOT a comma - this value is placed in the first capture group, thanks to the surrounding parentheses. Then it looks for the separating comma and after that, a number, which is placed in the second capture group (again, thanks to the surrounding parentheses). In the last line, I use the Groups property to access the matched groups. I use index 1 for the name and 2 for the age since it follows the order in which the match groups were defined in the regex string (index 0 contains the entire match).

Named Capture Groups

As soon as the regex becomes more advanced/longer than the one we just used, numbered capture groups might become unmanageable because you constantly have to remember the order and index of them. Fortunately for us, Regular Expressions and the .NET framework supports named capture groups, which will allow you to give each group a name in the regex and then reference it in the Groups property. Check out this re-written example, where we use named groups instead of numbered:

string testString = "John Doe, 42 years";
Regex regex = new Regex(@"^(?<name>[^,]+),\s(?<age>[0-9]+)");
Match match = regex.Match(testString);
if (match.Success)
    Console.WriteLine("Name: " + match.Groups["name"].Value + ". Age: " + match.Groups["age"].Value);

It works exactly as it does before, but you can now use logical names to lookup the matched values instead of having to remember the correct index. This might not be a big difference in our simple example, but as mentioned you will definitely appreciate it when your Regular Expressions grows in complexity and length.

The MatchCollection Class

The Match class is the way to go if you only want to work with a single match (remember that a match can contain multiple values, as we saw in the previous examples), but sometimes you want to work with several matches at once. For this, we have the Matches() method which will return a MatchCollection class. It will contain all matched values, in the order in which they were found. Let's have a look at how it can be used:

string testString = "123-456-789-0";
Regex regex = new Regex(@"([0-9]+)");
MatchCollection matchCollection = regex.Matches(testString);
foreach (Match match in matchCollection)
    Console.WriteLine("Number found at index " + match.Index + ": " + match.Value);

I have changed the regex and the test string, compared to the previous examples. We now have a test string which contains several numbers and a regex which specifically looks for strings consisting of one or more numbers. We use the Matches() method to get a MatchCollection from our regex, which contains the matches found in the string. In this case, there are four matches, which we output one after another with a foreach loop. The result will look something like this:

Number found at index 0: 123
Number found at index 4: 456
Number found at index 8: 789
Number found at index 12: 0

If no matches were found, an empty MatchCollection would have been returned.

Summary

With help from the Regex class, along with the Match and MatchCollection classes, we can easily do very advanced string matching. The Regular Expression syntax might seem very complex, but once you learn it, you will have a very strong tool. Even if you don't want to invest the time in learning the regex syntax, you can often find expressions for specific needs, created by other programmers, with a simple Google search. As soon as you have written or borrowed the regex string, you can use it for your own purpose with the techniques and classes demonstrated in this article.

But searching is only a part of the fun - you can also do some very cool search/replace operations with Regular Expressions. We will look into this in one of the next articles.


This article has been fully translated into the following languages: Is your preferred language not on the list? Click here to help us translate this article into your language!