|
02-18-2007, 06:32 AM | #1 |
Green Mole
Join Date: Feb 2007
Posts: 1
|
antiword tweaking code
Am wrestling with antiword. In short, MSWord documents are uploaded to site, diverted by antiword to temp dir where antiword parses and counts characters, then script divides char count by 5, and outputs a "word" count.
Less than 1 percent variance is desired - compared to what Word reports when its TOOLS are used to count characters. Have code in place to remove any whitespace above two spaces after end-sentence punctuation, and to include tabs and returns. } $content = str_replace('[pic]', '', $content); $content = preg_replace('/[\r\n\t]/', '', $content); $content = preg_replace('/([^\.\!\?"\'])[ ]+/', '$1', $content); $content = preg_replace('/\.[ ]{3,}/', '', $content); echo 'Total character count for '. $file.': '. strlen($content).'<br/>'; $total_chars += strlen($content); But I get anything from near perfect to 5% under or over. Anyone with any ideas on how to tweak this antiword code to something more reliable? TIA, Sarah |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Use Antiword instead of catdoc on Wintel | SABsearch2 | External Binaries | 1 | 10-04-2006 01:24 AM |
Sleep in the code | davids211082 | How-to Forum | 2 | 05-18-2005 11:22 PM |
Affiliate code | JPSSAU | How-to Forum | 1 | 06-19-2004 05:51 AM |
Code Requests | Charter | Feedback & News | 0 | 02-28-2004 11:45 PM |
Documented code | alivin70 | Mod Submissions | 1 | 10-06-2003 02:34 PM |