Archive for October, 2008

Tokenizing a string in php

Tuesday, October 21st, 2008

I decided today to resume development of my BayesSpam plugin for SquirrelMail. My first priority was speeding up the parsing of messages. My first step was a simple one. I was doing this: while (preg_match('/([a-zA-Z][a-zA-Z-_']{0,44})[,."')?!:;/&]{0,5}([ tnr]|$)/',$string,$matches)) { $string = preg_replace('/([a-zA-Z][a-zA-Z-_']{0,44})[,."')?!:;/&]{0,5}([ tnr]|$)/',' ',$string,1); if (isset($matches[1]) && $matches[1] ...