|
02-18-2007, 07:32 AM | #1 |
Green Mole
Join Date: Feb 2007
Posts: 1
|
antiword tweaking code
Am wrestling with antiword. In short, MSWord documents are uploaded to site, diverted by antiword to temp dir where antiword parses and counts characters, then script divides char count by 5, and outputs a "word" count.
Less than 1 percent variance is desired - compared to what Word reports when its TOOLS are used to count characters. Have code in place to remove any whitespace above two spaces after end-sentence punctuation, and to include tabs and returns. } $content = str_replace('[pic]', '', $content); $content = preg_replace('/[\r\n\t]/', '', $content); $content = preg_replace('/([^\.\!\?"\'])[ ]+/', '$1', $content); $content = preg_replace('/\.[ ]{3,}/', '', $content); echo 'Total character count for '. $file.': '. strlen($content).'<br/>'; $total_chars += strlen($content); But I get anything from near perfect to 5% under or over. Anyone with any ideas on how to tweak this antiword code to something more reliable? TIA, Sarah |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Use Antiword instead of catdoc on Wintel | SABsearch2 | External Binaries | 1 | 10-04-2006 02:24 AM |
Sleep in the code | davids211082 | How-to Forum | 2 | 05-19-2005 12:22 AM |
Affiliate code | JPSSAU | How-to Forum | 1 | 06-19-2004 06:51 AM |
Code Requests | Charter | Feedback & News | 0 | 02-29-2004 12:45 AM |
Documented code | alivin70 | Mod Submissions | 1 | 10-06-2003 03:34 PM |