PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 02-18-2007, 07:32 AM   #1
MTSC
Green Mole
 
Join Date: Feb 2007
Posts: 1
antiword tweaking code

Am wrestling with antiword. In short, MSWord documents are uploaded to site, diverted by antiword to temp dir where antiword parses and counts characters, then script divides char count by 5, and outputs a "word" count.

Less than 1 percent variance is desired - compared to what Word reports when its TOOLS are used to count characters.

Have code in place to remove any whitespace above two spaces after end-sentence punctuation, and to include tabs and returns.

}
$content = str_replace('[pic]', '', $content);
$content = preg_replace('/[\r\n\t]/', '', $content);
$content = preg_replace('/([^\.\!\?"\'])[ ]+/', '$1', $content);
$content = preg_replace('/\.[ ]{3,}/', '', $content);
echo 'Total character count for '. $file.': '. strlen($content).'<br/>';
$total_chars += strlen($content);

But I get anything from near perfect to 5% under or over.
Anyone with any ideas on how to tweak this antiword code to something more reliable?

TIA,
Sarah
MTSC is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Use Antiword instead of catdoc on Wintel SABsearch2 External Binaries 1 10-04-2006 02:24 AM
Sleep in the code davids211082 How-to Forum 2 05-19-2005 12:22 AM
Affiliate code JPSSAU How-to Forum 1 06-19-2004 06:51 AM
Code Requests Charter Feedback & News 0 02-29-2004 12:45 AM
Documented code alivin70 Mod Submissions 1 10-06-2003 03:34 PM


All times are GMT -8. The time now is 05:46 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.