Thursday, May 21, 2009

String.Replace vs Regex.Replace

I recently ran into a situation where I had to do some string manipulation. I had to replace some bad text with valid text. All this had to do with creating valid xml element names. So we had to string white space and some other invalid characters and replace them with valid xml element formatting. To do this we ended up using the Regex.Replace static method. Simply because there was some complexity involved that would have entailed using more then 1 String.Replace call when we can do it all with one regular expression. This caused some discussion about how heavy the regex class is and when to use it. I decided to do some testing and these are my results.

First the code

static void Main( string[] args )
{
// this is our test value
string testValue = "This is a test string";

int count;

// run the regex replace
DateTime regexStart = DateTime.Now;


for( count = 0; count < 10000000; count++ )
{
string newValue = Regex.Replace( testValue, " ", "_" );
}

DateTime regexStop = DateTime.Now;
TimeSpan regexTime = regexStop - regexStart;

// run the string replace
DateTime stringStart = DateTime.Now;

for( count= 0; count < 10000000; count++ )
{
string newValue = testValue.Replace( " ", "_" );
}

DateTime stringStop = DateTime.Now;
TimeSpan stringTime = stringStop - stringStart;

// output the results
Console.WriteLine( "Total iterations - " + count );

Console.WriteLine( Environment.NewLine );

Console.WriteLine( "regex total milliseconds for replace on 4 spaces to
_ was : " + regexTime.TotalMilliseconds );

Console.WriteLine( Environment.NewLine );

Console.WriteLine( "string total milliseconds for replace on 4 spaces to
_ was : " + stringTime.TotalMilliseconds );

// wait for keystroke to end
Console.ReadKey();

}


Now the results for 1 million items



Now the results for 10 million items



The results blew me away. String replace trounced the regex replace method. Now this was only for 1 character and turning 4 spaces into 4 underscores. For simple transformations like this I would always use String.Replace. However if you needed more complex things of course you should probably use the Regex.Replace. Of course always pick the right tool for the job, but this showed some interesting results.

2 comments:

MeBerserk said...

You should pre-compile the regex:
new Regex(pattern, RegexOptions.Compiled);

When you use Regex.Replace the code is generated on each call which is why it takes so long.

Justin said...

Hi,

I just finally saw this the other day and you have a good point. I tried it with the .Compiled option and still didn't see a great speed boost using that option. I will update the code and pictures ASAP with the new results.