TOC

The community is working on translating this tutorial into Indonesian, but it seems that no one has started the translation process for this article yet. If you can help us, then please click "More info".

Regular Expressions (Regex):

Search/Replace with the Regex Class

We have already discussed the Regex class and how to use it when we want to search through a string in a previous article. Regular Expressions are great for that, but another use case is when you want to carry out search/replace operations, where you want to look for a specific pattern and replace it with something else. The String class already has a Replace() method, but this is only good for doing simple searches. When using Regular Expressions, you can use the power of regex searches and even use captured groups as part of the replace string. Does it sound complicated? Don't worry, we'll start with a simple example and then slowly work toward more advanced use cases.

As in the previous article, all examples assume that you have imported the RegularExpressions namespace, like this:

using System.Text.RegularExpressions;

With that in place, let's try working with Regular Expression based string replacement. We'll use the Replace() method found on the Regex class:

string testString = "<b>Hello, <i>world</i></b>";
Regex regex = new Regex("<[^>]+>");
string cleanString = regex.Replace(testString, "");
Console.WriteLine(cleanString);

This example displays a very simplified approach to removing HTML tags from a string. We match anything that is surrounded by a set of angle brackets (<>) and then we use the Replace() method to replace each occurrence with an empty string, basically removing the HTML tags from the test string.

Replacing with Captured Values

But let's say that you don't actually want to remove them, but instead, you want to transform the tags into something that will not be interpreted by a browser, e.g. by replacing the angle brackets (<>) with square brackets ([]). This is where Regular Expressions really show their power, because it's actually very easy, as illustrated by this slightly rewritten version of our previous example:

string testString = "<b>Hello, <i>world</i></b>";
Regex regex = new Regex("<([^>]+)>");
string cleanString = regex.Replace(testString, "[$1]");
Console.WriteLine(cleanString);

I actually just changed two minor details: I added a set of parentheses to the regex, to create a capture group, essentially capturing the value between the angle brackets into the first capture group. In the Replace() method I reference this using the special notation $1, which basically just means capture group number 1. With that in place, our output will now look like this:

[b]Hello, [i]world[/i][/b]

Named Capture Groups

You can of course do the exact same thing when using named capture groups (discussed in the previous article), like this:

string testString = "<b>Hello, <i>world</i></b>";
Regex regex = new Regex("<(?<tagName>[^>]+)>");
string cleanString = regex.Replace(testString, "[${tagName}]");
Console.WriteLine(cleanString);

When using named capture groups, just use the ${name-of-capture-group} notation.

Using a MatchEvaluator method

But if we want even more control over how the value is replaced? We can use a MatchEvaluator parameter for this - it's basically just a reference (delegate) to a method which will be called each time a replacement is to be made, allowing you to modify the replacement value before it's used. Let's stick with the HTML tags example we have already used a couple of times, but this time, we take control over which HTML tags are used. Here's the complete example:

using System;
using System.Text.RegularExpressions;

namespace RegexSearchReplaceMethod
{
    class Program
    {
static void Main(string[] args)
{
    string testString = "<b>Hello, <i>world</i></b>";
    Regex regex = new Regex("<(?<tagName>[^>]+)>");
    string cleanString = regex.Replace(testString, ProcessHtmlTag);
    Console.WriteLine(cleanString);
}

private static string ProcessHtmlTag(Match m)
{
    string tagName = m.Groups["tagName"].Value;
    string endTagPrefix = "";
    if(tagName.StartsWith("/"))
    {
endTagPrefix = "/";
tagName = tagName.Substring(1);
    }
    switch (tagName)
    {
case "b":
    tagName = "strong";
    break;
case "i":
    tagName = "em";
    break;
    }
    return "<" + endTagPrefix + tagName.ToLower() + ">";
}
    }
}

The first part of the example looks exactly as it did before, but instead of supplying a replacement string, we pass on a reference to our ProcessHtmlTag() method. As mentioned, this method is called each time a replacement is about to be made, with the Match in question as a parameter. This means that in our MatchEvaluator method, we have all the information about the match so that we can act accordingly. In this case, we use this opportunity to make the tags more semantic by replacing the bold (b) tag with a strong tag and the italic (i) tag with an emphasis (em) tag. No matter if the tag is changed or not, we turn it into lowercase.

Using a MatchEvaluator parameter is obviously very powerful and this is just a simple example of what can be accomplished.

Summary

Search/replace operations becomes very powerful when you use Regular Expressions and even more so when you use the MatchEvaluator parameter, where the possibilities for manipulating strings become almost endless.


This article has been fully translated into the following languages: Is your preferred language not on the list? Click here to help us translate this article into your language!