Regular Expressions and LINQ

Regular expressions are a great, albeit sometimes overused, tool for parsing and extracting text data. Sometimes I find myself looking at a list of data that I want to execute a regular expression against. I could do a for loop or encode a Regex match in a lambda but both interrupt the flow and require a fair bit of boilerplate.

I instead decided to write several simple extensions to make it easier on myself. Note that these import the System.Text.RegularExpressions namespace.

/// <summary>
/// Runs a regular expression against strings and returns matches as they are found.
/// </summary>
/// <param name="items">An enumerable set of strings.</param>
/// <param name="pattern">A regex pattern.</param>
public static IEnumerable<Match> Matches(this IEnumerable<string> items, string pattern)
{
    return Matches(items, new Regex(pattern));
}

/// <summary>
/// Runs a regular expression against strings and returns matches as they are found.
/// </summary>
/// <param name="items">An enumerable set of strings.</param>
/// <param name="pattern">A regex pattern.</param>
/// <param name="options">The regex options to use.</param>
public static IEnumerable<Match> Matches(this IEnumerable<string> items, string pattern, RegexOptions options)
{
    return Matches(items, new Regex(pattern, options));
}

/// <summary>
/// Runs a regular expression against strings and returns matches as they are found.
/// </summary>
/// <param name="items">An enumerable set of strings.</param>
/// <param name="regex">Regex to run.</param>
public static IEnumerable<Match> Matches(this IEnumerable<string> items, Regex regex)
{
    foreach (var item in items)
    {
        var m = regex.Match(item);
        if (m.Success)
            yield return m;
    }
}

Here’s a simple usage example that parses a list of numbers encoded as text with different units postfixes. If it finds anything in milimeters, it will convert it to inches.

var strings = new[] { "3.453\"", "4.2343 inches", "2.34 mm", "19.3 in", "13 milimeters", "hello world" };

var pattern = @"^[ ]<em>(?<number>\d+(?:.\d+)?)[ ]</em>(?:(?<inch>""|inches|in)|(?<mm>mm|milimeters))[ ]*$";

var inches = strings.Matches(pattern)
                    .Select(m => Convert.ToDecimal(m.Groups["number"].Value) * (m.Groups["mm"].Success ? 0.0393701m : 1m));

foreach (var item in inches)
{
    Console.WriteLine("{0:0.000#}\"", item);
}

This produces the following output:

3.453"
4.2343"
0.0921"
19.300"
0.5118"

You could also extend this for other purposes such as excluding certain lines from a text report, flattening match collections across multiple matches, etc. Feel free to share your own creations in the comments.

This entry was posted in Coding.

One thought on “Regular Expressions and LINQ

Leave a Reply

Your email address will not be published. Required fields are marked *