Tokenizing a string in php
Posted by Utoxin on October 21, 2008
0 comments
I decided today to resume development of my BayesSpam plugin for SquirrelMail. My first priority was speeding up the parsing of messages. My first step was a simple one. I was doing this: while (preg_match(’/([a-zA-Z][a-zA-Z-_']{0,44})[,."')?!:;/&]{0,5}([ tnr]|$)/’,$string,$matches)) { $string = preg_replace(‘/([a-zA-Z][a-zA-Z-_']{0,44})[,."')?!:;/&]{0,5}([ tnr]|$)/’,’ ‘,$string,1); if (isset($matches[1]) && $matches[1] && strlen($matches[1]) >= 3) $return[] = $token_type.’: ‘.$matches[1]; } [...]