The community is working on translating this tutorial into Polish, but it seems that no one has started the translation process for this article yet. If you can help us, then please click "More info".
Regex Modifiers
In previous articles, we talked about what Regular Expressions are and how to use them in C# for matching, replacing and so on. At this point, you should already have realized how powerful Regular Expressions are and how they can help you in a lot of situations, but they get even more powerful when you know about the possible modifiers.
When working with Regular Expressions, you can use one or several modifiers to control the behavior of the matching engine. For instance, a Regex matching process is usually case-sensitive, meaning that "a" is not the same as "A". However, in a lot of situations, you want your match to be case-insensitive so that the character "a" is just a letter, no matter if its in lowercase or UPPERCASE. Simply supply the RegexOptions.IgnoreCase option when creating the Regex instance and your match will be case-insensitive.
You'll find all the available modifiers in the RegexOptions enumeration. Several of them are common among all programming languages supporting the Regular Expression standard, while others are specific to the .NET framework.
As you'll see in the first example, Regex modifiers are usually specified as the second parameter when creating the Regex instance. You can specify more than one option by separating them with a pipe (|) character, like this:
new Regex("[a-z]+", RegexOptions.IgnoreCase | RegexOptions.Singleline);
Now let's run through all the modifiers to give you an idea of how they work and what they can do for you.
RegexOptions.IgnoreCase
This will likely be one of your most used modifiers. As described above, it will change your Regular Expressions from being case-sensitive to being case-insensitive. This makes a big difference, as you can see in this example:
public void IgnoreCaseModifier()
{
string testString = "Hello World";
string regexString = @"^[a-z\s]+$";
Regex caseSensitiveRegex = new Regex(regexString);
Regex caseInsensitiveRegex = new Regex(regexString, RegexOptions.IgnoreCase);
Console.WriteLine("Case-sensitive match: " + caseSensitiveRegex.IsMatch(testString));
Console.WriteLine("Case-insensitive match: " + caseInsensitiveRegex.IsMatch(testString));
}
We specify a simple Regex, designed to match only letters (a-z) and whitespaces. We use it to create to Regex instances: One without the RegexOptions.IgnoreCase modifier and one with it, and then we try to match the same test string, which consists of lowercase and UPPERCASE characters and a single space. The output will, probably not surprisingly, look like this:
Case-sensitive match: False
Case-insensitive match: True
RegexOptions.Singleline
In Regular Expressions, the dot (.) is basically a catch-all character. However, by default, it doesn't match linebreaks, meaning that you can use the dot to match an entire line of letters, numbers, special characters and so on, but the match will end as soon as a linebreak is encountered. However, if you supply the Singleline modifier, the dot will match linebreaks as well. Allow me to demonstrate the difference:
public void SinglelineModifier()
{
string testString =
@"Hello World
This string contains
several lines";
string regexString = ".*";
Regex normalRegex = new Regex(regexString);
Regex singlelineRegex = new Regex(regexString, RegexOptions.Singleline);
Console.WriteLine("Normal regex: " + normalRegex.Match(testString).Value);
Console.WriteLine("Singleline regex: " + singlelineRegex.Match(testString).Value);
}
The output will look like this:
Normal regex: Hello World
Singleline regex: Hello World
This string contains
several lines
RegexOptions.Multiline
As we have talked about in this chapter, Regular Expressions consists of many different characters which have special purposes. Another example of this is these two characters: ^ and $. We actually used them in the case-sensitivity example above, to match the beginning and end of a string. However, by supplying the Multiline modifier, you can change this behavior from matching the beginning/end of a string to match the beginning/end of lines. This is very useful when you want to deal individually with the lines matched. Here's an example:
public void MultilineModifier()
{
string testString =
@"Hello World
This string contains
several lines";
string regexString = "^.*$";
Regex singlelineRegex = new Regex(regexString, RegexOptions.Singleline);
Regex multilineRegex = new Regex(regexString, RegexOptions.Multiline);
Console.WriteLine("Singleline regex: " + singlelineRegex.Match(testString).Value);
Console.WriteLine("Multiline regex:");
MatchCollection matches = multilineRegex.Matches(testString);
for(int i = 0; i < matches.Count; i++)
Console.WriteLine("Line " + i + ": " + matches[i].Value.Trim());
}
Notice how I use several a test string consisting of several lines and then use the matching mechanisms differently: With singlelineRegex, we treat the entire test string as one line, even though it contains linebreaks, as we discussed above. When using the multilineRegex we treat the test string as multiple lines, each resulting in a match. We can use the Regex.Matches() method to catch each line and work with it - in this case, we simply output it to the Console.
RegexOptions.Compiled
While Regular Expressions are generally pretty fast, they can slow things down a bit if they are very complex and executed many times, e.g. in a loop. For these situations, you may want to use the RegexOptions.Compiled modifier, which will allow the framework to compile the Regex into an assembly. This costs a little extra time when you create it, compared to just instantiating a Regex object normally, but it will make all subsequent Regex operations (matches etc.) faster:
Regex compiledRegex = new Regex("[a-z]*", RegexOptions.Compiled);
More modifiers
The above modifiers are the most interesting ones, but there's a few more, which we'll just go through a bit faster:
- RegexOptions.CultureInvariant: With this modifier, cultural differences in language is ignored. This is mostly relevant if your application works with multiple non-English languages.
- RegexOptions.ECMAScript: Changes the Regex variant used from the .NET specific version to the ECMAScript standard. This should rarely be necessary.
- RegexOptions.ExplicitCapture: Normally, a set of parentheses in a Regex acts as a capturing group, allowing you to access each captured value through an index. If you specify the ExplicitCapture modifier, this behavior is changed so that only named groups are captured and stored for later retrieval.
- RegexOptions.IgnorePatternWhitespace: When this modifier is enabled, whitespace in the Regex is ignored and you are even allowed to include comments, prefixed with the hash (#) char.
- RegexOptions.RightToLeft: Changes matching to start from right and move left, instead of the default from left to right.
Summary
As you can see, there are many important Regex modifiers that you should know about to take full advantage of Regular Expressions, to support as many use-cases as possible.