PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   External Binaries (http://www.phpdig.net/forum/forumdisplay.php?f=36)
-   -   Index MSWORD But No search result (http://www.phpdig.net/forum/showthread.php?t=1196)

wessam 08-20-2004 09:02 AM

Index MSWORD But No search result
 
Hi All
I'm try indexing MSWORD Files but when im try search the content of this file i got nothing
my config file look like :
define('PHPDIG_INDEX_MSWORD',true);
define('PHPDIG_PARSE_MSWORD','c:\appserv\www\catdoc\catdoc');
define('PHPDIG_OPTION_MSWORD','');

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/usr/local/bin/pstotext');
define('PHPDIG_OPTION_PDF','-cork');

define('PHPDIG_INDEX_MSEXCEL',true);
define('PHPDIG_PARSE_MSEXCEL','c:\appserv\www\catdoc\xls2csv');
define('PHPDIG_OPTION_MSEXCEL','-s 8859-1');



//---------EXTERNAL TOOLS EXTENSIONS
// if external binary is not STDOUT or different extension is needed
// for example, use '.txt' if external binary writes to filename.txt
define('PHPDIG_MSWORD_EXTENSION','');
define('PHPDIG_PDF_EXTENSION','');
define('PHPDIG_MSEXCEL_EXTENSION','');
define('PHPDIG_MSPOWERPOINT_EXTENSION','');

and i add this line of code to robot_functions.php:
$command = PHPDIG_PARSE_MSWORD.' '.PHPDIG_OPTION_MSWORD.' '.$tempfile2.' 2>&1';


when im try catdoc in command line its work and got my MSWORD
c:\Appserv\www\catdoc\catdoc w.doc

im try check this Information
but still can't search my word

document files
please any help

Charter 08-20-2004 09:26 AM

Did you try it with .exe added on to catdoc?

wessam 08-20-2004 01:02 PM

yes and i got the same things

Charter 08-20-2004 01:09 PM

Like this?
PHP Code:

define('PHPDIG_PARSE_MSWORD','C:\\\\appserv\\\\www\\\\catdoc\\\\catdoc.exe'); 


wessam 08-20-2004 01:12 PM

thanks for you fast answers

and yes i try this one and also 'c:\appserv\........'

Charter 08-20-2004 01:22 PM

Hi. Go back to this thread and add the code, and then reindex, and let me know what it says when it encounters the Word document.

wessam 08-20-2004 01:32 PM

hi..
this the output
--------------------------------------------------------------------------------
SITE : http://localhost/
Exclude paths :
- @NONE@


Is result test http an array: 1
What is result test http status: HTML

Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: /usr/local/bin/pstotext
Does parse pdf exist:
Is parse pdf executable:
1:http://localhost/test/
(time : 00:00:05)
+
level 1...


Is result test http an array: 1
What is result test http status: MSWORD

Is result test an array: 1
What is result test status: MSWORD
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: /usr/local/bin/pstotext
Does parse pdf exist:
Is parse pdf executable:
2:http://localhost/test/w.doc
(time : 00:00:15)

No link in temporary table

--------------------------------------------------------------------------------

links found : 2
http://localhost:10/test/
http://localhost:10/test/w.doc
Optimizing tables...
Indexing complete !
--------------------------------------------------------------------------------
[Back] to admin interface.

Charter 08-20-2004 01:42 PM

Set the following and do another reindex:
PHP Code:

define('PHPDIG_INDEX_PDF',false); 


wessam 08-20-2004 01:52 PM

Hi i did but still can't search my word document
SITE : http://localhost/
Exclude paths :
- @NONE@


Is result test http an array: 1
What is result test http status: HTML

Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to:
Parse the pdf is set to: /usr/local/bin/pstotext
Does parse pdf exist:
Is parse pdf executable:
1:http://localhost/test/
(time : 00:00:05)
+
level 1...


Is result test http an array: 1
What is result test http status: MSWORD

Is result test an array: 1
What is result test status: MSWORD
Use is executable is set to: 0
Index the pdf is set to:
Parse the pdf is set to: /usr/local/bin/pstotext
Does parse pdf exist:
Is parse pdf executable:
2:http://localhost/test/w.doc
(time : 00:00:15)

No link in temporary table

--------------------------------------------------------------------------------

links found : 2
http://localhost:10/test/
http://localhost:10/test/w.doc
Optimizing tables...
Indexing complete !

Charter 08-20-2004 01:57 PM

Oh, you need to edit the code you added so that it is for Word documents, not for PDFs. For example...
PHP Code:

// it can have _PDF or _MSWORD or _MSEXCEL depending on binary
$command PHPDIG_PARSE_MSWORD.' '.PHPDIG_OPTION_MSWORD.' '.$tempfile2.' 2>&1'


wessam 08-20-2004 02:03 PM

im sorry coz im bother you
I did but nothing new :((
SITE : http://localhost/
Exclude paths :
- @NONE@


Is result test http an array: 1
What is result test http status: HTML

Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to:
Parse the pdf is set to: /usr/local/bin/pstotext
Does parse pdf exist:
Is parse pdf executable:
1:http://localhost/test/
(time : 00:00:05)
+
level 1...


Is result test http an array: 1
What is result test http status: MSWORD

Is result test an array: 1
What is result test status: MSWORD
Use is executable is set to: 0
Index the pdf is set to:
Parse the pdf is set to: /usr/local/bin/pstotext
Does parse pdf exist:
Is parse pdf executable:
2:http://localhost/test/w.doc
(time : 00:00:15)

No link in temporary table

Charter 08-20-2004 02:18 PM

I mean throughout, including for these things...
PHP Code:

// in the next four lines change _PDF to either _MSWORD or _MSEXCEL for those binaries
echo "Index the pdf is set to: " PHPDIG_INDEX_PDF "<br>";
echo 
"Parse the pdf is set to: " PHPDIG_PARSE_PDF "<br>";
echo 
"Does parse pdf exist: " file_exists(PHPDIG_PARSE_PDF) . "<br>";
echo 
"Is parse pdf executable: " is_executable(PHPDIG_PARSE_PDF) . "<br>"

It's still using _PDF because "/usr/local/bin/pstotext" is getting printed.

wessam 08-20-2004 02:19 PM

Hi this is what i got now

SITE : http://localhost/
Exclude paths :
- @NONE@


Is result test http an array: 1
What is result test http status: HTML

Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to:
Parse the pdf is set to: /usr/local/bin/pstotext
Does parse pdf exist:
Is parse pdf executable:
1:http://localhost/test/
(time : 00:00:05)
+
level 1...


Is result test http an array: 1
What is result test http status: MSWORD

Is result test an array: 1
What is result test status: MSWORD
Use is executable is set to: 0
Index the pdf is set to:
Parse the pdf is set to: /usr/local/bin/pstotext
Does parse pdf exist:
Is parse pdf executable:

Command is: c:\appserv\www\catdoc\catdoc.exe -s 8859-1 ../admin/temp/75689462.tmp 2>&1
Result contains: Array ( [0] => The system cannot execute the specified program. )
Return value is: 1

2:http://localhost/test/w.doc
(time : 00:00:16)

No link in temporary table

wessam 08-20-2004 02:25 PM

after that i remove the .exe from the path and got
SITE : http://localhost/
Exclude paths :
- @NONE@


Is result test http an array: 1
What is result test http status: HTML

Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to:
Parse the pdf is set to: /usr/local/bin/pstotext
Does parse pdf exist:
Is parse pdf executable:
1:http://localhost/test/
(time : 00:00:05)
+
level 1...


Is result test http an array: 1
What is result test http status: MSWORD

Is result test an array: 1
What is result test status: MSWORD
Use is executable is set to: 0
Index the pdf is set to:
Parse the pdf is set to: /usr/local/bin/pstotext
Does parse pdf exist:
Is parse pdf executable:
2:http://localhost/test/w.doc
(time : 00:00:15)

No link in temporary table

--------------------------------------------------------------------------------

links found : 2
http://localhost:10/test/
http://localhost:10/test/w.doc
Optimizing tables...
Indexing complete !

Charter 08-20-2004 02:31 PM

Why is "Parse the pdf is set to: /usr/local/bin/pstotext" still printing?

It should be the following code...
PHP Code:

// in the next four lines change _PDF to either _MSWORD or _MSEXCEL for those binaries
echo "Index the doc is set to: " PHPDIG_INDEX_MSWORD "<br>";
echo 
"Parse the doc is set to: " PHPDIG_PARSE_MSWORD "<br>";
echo 
"Does parse doc exist: " file_exists(PHPDIG_PARSE_MSWORD) . "<br>";
echo 
"Is parse doc executable: " is_executable(PHPDIG_PARSE_MSWORD) . "<br>"

Try that and also keep the following:
PHP Code:

define('PHPDIG_OPTION_MSWORD',''); // two single quotes, no space between 



All times are GMT -8. The time now is 09:58 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.