|
12-07-2003, 02:35 AM | #1 |
Green Mole
Join Date: Dec 2003
Posts: 9
|
Indexing MS Word docs under Windows
I've had no success trying to index MS Word (.DOC) documents under Windows. I have:
Code:
define('PHPDIG_INDEX_MSWORD',true); define('PHPDIG_PARSE_MSWORD','C:\\Program Files\\EasyPHP1-7\\www\\k3\\catdoc'); define('PHPDIG_OPTION_MSWORD','-s 8859-1'); Any help appreciated Phil |
12-07-2003, 11:06 AM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Is USE_IS_EXECUTABLE_COMMAND set to true (one) or false (zero) in the config file?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
12-08-2003, 06:05 AM | #3 |
Green Mole
Join Date: Dec 2003
Posts: 9
|
USE_IS_EXECUTABLE_COMMAND is set at the default value of 1. But things have got worse ...
I decided to can 1.6.2 and try 1.6.5, so I removed all code and the DB tables and re-installed 1.6.5 - install seemed to go OK, but now I can't get past here: Code:
Spidering in progress... -------------------------------------------------------------------------------- SITE : http://[whatever] Exclude paths : - @NONE@ Fatal error: Call to undefined function: is_executable() in c:\program files\easyphp1-7\www\k3\phpdig\admin\robot_functions.php on line 633 Phil |
12-08-2003, 07:59 AM | #4 |
Purple Mole
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
|
|
12-08-2003, 08:58 AM | #5 |
Green Mole
Join Date: Dec 2003
Posts: 9
|
That seemed to work - thanks!
|
12-08-2003, 10:14 AM | #6 |
Green Mole
Join Date: Dec 2003
Posts: 9
|
we-ell, we're improving .... now it spiders OK without giving errors, but it still isn't indexing the contents of the .doc files ... I tried spidering directly to the URL of a .doc file I knew existed:
Code:
Spidering in progress... -------------------------------------------------------------------------------- SITE : http://[mysite IP]/ Exclude paths : - @NONE@ 1:http://[mysite IP]/k3/CVs/4.doc (time : 00:00:03) No link in temporary table -------------------------------------------------------------------------------- links found : 1 http://[mysite IP]/k3/CVs/4.doc Optimizing tables... Indexing complete ! -------------------------------------------------------------------------------- [Back] to admin interface. Any more help welcome. |
12-08-2003, 02:07 PM | #7 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. From the command line what does the following produce?
C:\\Program Files\\EasyPHP1-7\\www\\k3\\catdoc -s 8859-1 change-me-4.doc
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
12-09-2003, 01:44 AM | #8 |
Green Mole
Join Date: Dec 2003
Posts: 9
|
"Cannot load charset cp1251 - file not found"
|
12-09-2003, 02:50 AM | #9 |
Green Mole
Join Date: Dec 2003
Posts: 9
|
OK, sorted out the charset paths, now seems to extract text OK from the command line, but still not via the web interface...
|
12-09-2003, 04:02 AM | #10 |
Green Mole
Join Date: Dec 2003
Posts: 9
|
OK, all working; it seems that it didn't like the path name having a space in it at C:\\Program Files\\.......
Once I moved catdoc (and it's config subdirectories) to a path not requiring a space (C:\\ for instance) all was well. Many thanks for your help, guys. (Though I'm sure I'll be back with more dopy questions BTW my own requirement is for index searching on just one, local directory full of MS Word files. To facilitate this I have a file index.php which provides a link for the spider to all Word files in the directory: Code:
<HTML> <HEAD></HEAD> <BODY> <? // function to return file extension (converts extn to lower case) function gfext($filename) { $pathinfo = pathinfo($filename); $ext = $pathinfo['extension']; return strtolower($ext); } // read this directory if ($handle = opendir('.')) { while (false !== ($file = readdir($handle))) { if (gfext($file) == "doc") { // we only want the Word files echo "<a href=\"".$file."\">".$file."</a><br>"; } } closedir($handle); } ?> </BODY> </HTML> All the best Phil Last edited by phil_ballard; 12-09-2003 at 04:05 AM. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Indexing MS Word without binaries | phil_ballard | External Binaries | 2 | 05-08-2006 03:25 AM |
Indexing "<word>-<word>"? | FaberFedor | How-to Forum | 23 | 02-28-2005 04:35 AM |
Indexing word docs | javajaga | External Binaries | 1 | 03-30-2004 09:19 AM |
Indexing word doc's OK search through files don't work | dapuse | External Binaries | 3 | 02-05-2004 08:38 AM |
Can PhpDig index OpenOffice Docs? | veggie2u | How-to Forum | 1 | 12-08-2003 02:52 PM |