I decided today to resume development of my BayesSpam plugin for SquirrelMail. My first priority was speeding up the parsing of messages.
My first step was a simple one. I was doing this:
while (preg_match('/([a-zA-Z][a-zA-Z-_']{0,44})[,."')?!:;/&]{0,5}([ tnr]|$)/',$string,$matches)) { $string = preg_replace('/([a-zA-Z][a-zA-Z-_']{0,44})[,."')?!:;/&]{0,5}([ tnr]|$)/',' ',$string,1); if (isset($matches[1]) && $matches[1] && strlen($matches[1]) >= 3) $return[] = $token_type.': '.$matches[1]; }
I replaced it with this:
$token = strtok($string, " rnt,."()?!:;/&"); while($token !== false) { if (strlen($token) >= 3 && strlen($token) < 45) { $return[] = $token_type.': '.$token; } $token = strtok(" rnt,."()?!:;/&"); }
Benchmarks show it as taking ~50% as long as the original version, which is a significant speedup.









0 Comments.