Regular Expressions and LINQ

Regular expressions are a great, albeit sometimes overused, tool for parsing and extracting text data. Sometimes I find myself looking at a list of data that I want to execute a regular expression against. I could do a for loop or encode a Regex match in a lambda but both interrupt the flow and require a fair bit of boilerplate.

I instead decided to write several simple extensions to make it easier on myself. Note that these import the [generic]System.Text.RegularExpressions[/generic] namespace.

[csharp]
///

/// Runs a regular expression against strings and returns matches as they are found.
///

/// An enumerable set of strings. /// A regex pattern. public static IEnumerable Matches(this IEnumerable items, string pattern)
{
return Matches(items, new Regex(pattern));
}

///

/// Runs a regular expression against strings and returns matches as they are found.
///

/// An enumerable set of strings. /// A regex pattern. /// The regex options to use. public static IEnumerable Matches(this IEnumerable items, string pattern, RegexOptions options)
{
return Matches(items, new Regex(pattern, options));
}

///

/// Runs a regular expression against strings and returns matches as they are found.
///

/// An enumerable set of strings. /// Regex to run. public static IEnumerable Matches(this IEnumerable items, Regex regex)
{
foreach (var item in items)
{
var m = regex.Match(item);
if (m.Success)
yield return m;
}
}
[/csharp]

Here’s a simple usage example that parses a list of numbers encoded as text with different units postfixes. If it finds anything in milimeters, it will convert it to inches.

[csharp]
var strings = new[] { “3.453\””, “4.2343 inches”, “2.34 mm”, “19.3 in”, “13 milimeters”, “hello world” };

var pattern = @”^[ ](?\d+(?:.\d+)?)[ ](?:(?“”|inches|in)|(?mm|milimeters))[ ]*$”;

var inches = strings.Matches(pattern)
.Select(m => Convert.ToDecimal(m.Groups[“number”].Value) * (m.Groups[“mm”].Success ? 0.0393701m : 1m));

foreach (var item in inches)
{
Console.WriteLine(“{0:0.000#}\””, item);
}
[/csharp]

This produces the following output:
[raw linenumbers=”false”]
3.453″
4.2343″
0.0921″
19.300″
0.5118″
[/raw]

You could also extend this for other purposes such as excluding certain lines from a text report, flattening match collections across multiple matches, etc. Feel free to share your own creations in the comments.

This entry was posted in Coding.

One thought on “Regular Expressions and LINQ

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.