PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > External Binaries

Reply
 
Thread Tools
Old 10-01-2004, 03:30 AM   #1
mleray
Orange Mole
 
Join Date: Sep 2004
Location: Nantes (44) FRANCE
Posts: 31
problem with .pdf and .doc files

Hi,

As I'm not very good in english, I'm a little losted in this Forum.
I've seen many topics speaking about issues with indexing pdf but can't find a solution. I'm sure it is on the forum...

So, my problem is that my pdf files seem to be indexed. But when I search a keyword or the filename of one of them, I can't find it.
I've searched in the database and never seen any pdf file (never .doc file..., but .xls seem to be ok)

I use PHP 4.3.3, MySQL 4.0.15 on Windows XP
The PHPDig version is 1.8.3
The site I'm trying to index is the Intranet site, so I can't make a link for you to see..

PHP Code:
//---------EXTERNAL TOOLS SETUP
// if set to true is_executable used - set to '0' if is_executable is undefined
define('USE_IS_EXECUTABLE_COMMAND','1'); //use is_executable for external binaries

// if set to true, full path to external binary required
define('PHPDIG_INDEX_MSWORD',true);//*** false

define('PHPDIG_PARSE_MSWORD','C:/Stage_Manuella/moteur/PHPDIG_DIR/catdoc-0.93.3');
define('PHPDIG_OPTION_MSWORD','-s 8859-1');

define('PHPDIG_INDEX_PDF',true); //*** false
define('PHPDIG_PARSE_PDF','C:/Stage_Manuella/moteur/PHPDIG_DIR/Ghostgum/pstotext');
define('PHPDIG_OPTION_PDF','-cork');

define('PHPDIG_INDEX_MSEXCEL',true);//*** false
define('PHPDIG_PARSE_MSEXCEL','C:/Stage_Manuella/moteur/PHPDIG_DIR/catdoc-0.93.3');
define('PHPDIG_OPTION_MSEXCEL','');

define('PHPDIG_INDEX_MSPOWERPOINT',false);
define('PHPDIG_PARSE_MSPOWERPOINT','/usr/local/bin/ppt2text');
define('PHPDIG_OPTION_MSPOWERPOINT','');

//---------EXTERNAL TOOLS EXTENSIONS
// if external binary is not STDOUT or different extension is needed
// for example, use '.txt' if external binary writes to filename.txt
define('PHPDIG_MSWORD_EXTENSION','');
define('PHPDIG_PDF_EXTENSION','');
define('PHPDIG_MSEXCEL_EXTENSION','');
define('PHPDIG_MSPOWERPOINT_EXTENSION',''); 
Examples of what I get in my browser after indexing :
niveau 2...
4:http://10.37.1.240/dossier_presse/dp_2004_a.pdf (not checked)
(temps : 00:01:22)

5:http://10.37.1.240/arrete_100903.pdf (not checked)
(temps : 00:01:30)

6:http://10.37.1.240/Ressources-Humain...lephonique.htm (checked)
(temps : 00:01:51)
+ + + + + +

And in the summary :
http://10.37.1.240/dossier_presse/dp_2004_a.pdf
mleray is offline   Reply With Quote
Old 10-01-2004, 06:05 AM   #2
mleray
Orange Mole
 
Join Date: Sep 2004
Location: Nantes (44) FRANCE
Posts: 31
I try what is writing in the readme topic and this is what I obtain :

Is result test http an array: 1
What is result test http status: HTML

Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: C:/Stage_Manuella/moteur/PHPDIG_DIR/Ghostgum/pstotext
Does parse pdf exist: 1

Fatal error: Call to undefined function: is_executable() in c:\stage_manuella\moteur\phpdig_dir\phpdig-1.8.3\admin\robot_functions.php on line 963
mleray is offline   Reply With Quote
Old 10-01-2004, 06:55 AM   #3
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Set USE_IS_EXECUTABLE_COMMAND to zero in the config file.
PHP Code:
// if set to true is_executable used - set to '0' if is_executable is undefined
define('USE_IS_EXECUTABLE_COMMAND','1'); //use is_executable for external binaries 
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 10-01-2004, 07:23 AM   #4
mleray
Orange Mole
 
Join Date: Sep 2004
Location: Nantes (44) FRANCE
Posts: 31
I've done it.
Quote:
Use is executable is set to: 0
But nothing changes. I always have the error message.

Should I put the path to the executable with the name of the file (pstotxt3.exe) or not ?

like this :
PHP Code:
define('PHPDIG_PARSE_PDF','C:\Stage_Manuella\moteur\PHPDIG_DIR\Ghostgum\pstotext'); 
or like this :
PHP Code:
define('PHPDIG_PARSE_PDF','C:\Stage_Manuella\moteur\PHPDIG_DIR\Ghostgum\pstotext\pstotxt3'); 
(there are no spaces in my code : psto text = pstotext)
or something else ? should I put relative path or absolute ?

Last edited by mleray; 10-01-2004 at 07:35 AM.
mleray is offline   Reply With Quote
Old 10-01-2004, 08:19 AM   #5
mleray
Orange Mole
 
Join Date: Sep 2004
Location: Nantes (44) FRANCE
Posts: 31
I try with pdftotext, seems to be better but not perfect ...

Is result test http an array: 1
What is result test http status: PDF

Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: C:\Stage_Manuella\moteur\PHPDIG_DIR\xpdf-3.00-win32\pdftotext.exe
Does parse pdf exist: 1

Command is: C:\Stage_Manuella\moteur\PHPDIG_DIR\xpdf-3.00-win32\pdftotext.exe ../admin/temp/95662532.tmp 2>&1
Result contains: Array ( [0] => Error: Copying of text from this document is not allowed. )
Return value is: 3

What does this error mean ?
mleray is offline   Reply With Quote
Old 10-05-2004, 12:22 AM   #6
mleray
Orange Mole
 
Join Date: Sep 2004
Location: Nantes (44) FRANCE
Posts: 31
No more help ?
Is there any frenchies here ?
mleray is offline   Reply With Quote
Old 10-06-2004, 04:29 AM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
>> Result contains: Array ( [0] => Error: Copying of text from this document is not allowed. )

The issue is with the PDF, not PhpDig. The PDF permissions are set such that "copying of text from this document is not allowed."
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 10-08-2004, 12:59 AM   #8
mleray
Orange Mole
 
Join Date: Sep 2004
Location: Nantes (44) FRANCE
Posts: 31
Seems to be ok now. Thanks.

But now I've got new problem with catdoc and xls2csv


Quote:
Is result test an array: 1
What is result test status: MSEXCEL
Use is executable is set to: 0
******************************************************
Does parse xls exist: 1
Index the xls is set to: 1
Parse the xls is set to: C:\Stage_Manuella\moteur\PHPDIG_DIR\catdoc-0.93.4\xls2csv.exe
******************************************************
Command is: C:\Stage_Manuella\moteur\PHPDIG_DIR\catdoc-0.93.4\xls2csv.exe -s 8859-1 ../admin/temp/64971482.tmp 2>&1
Result contains: Array ( [0] => Le systÅ*me ne peut ex‚cuter le programme sp‚cifi‚. )
Return value is: 1
In english : The system cannot carry out the specified program...

It's the same with catdoc.exe

If I try to launch the program in MS-DOS like this :


Quote:
Microsoft Windows XP [version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\Administrateur.EFSTSE>cd../..

C:\>cd Stage_Manuella\moteur\PHPDIG_DIR\catdoc-0.93.4

C:\Stage_Manuella\moteur\PHPDIG_DIR\catdoc-0.93.4>xls2csv test.xls
"NOM","PRENOM","AGE"
"Leray","Manuella","27"
"Leray","Sylvain","24"
"Rauturier","Myriam","52"
You can see that it works...
mleray is offline   Reply With Quote
Old 10-12-2004, 07:57 AM   #9
mleray
Orange Mole
 
Join Date: Sep 2004
Location: Nantes (44) FRANCE
Posts: 31
Exclamation Very Important for catdoc & xls2csv ! + traduction française

I've found a solution to my problem with these external binaries.
I'd got PHP install with EasyPHP but it should be instal in CGI mode !
So now I've change robot_function.php to robot_function.cgi and spider.php to spider.cgi and the links to these files should be change as you had guess...
And it works ! No I just have problem with accent as I'm french but that's all.

Hope that will help.

-----------------------------------------------------------------
Traduction française...

J'ai trouvé la solution Ã* mon problème avec les external binaries.
J'avais installé PHP en module avec EasyPHP mais il fallait l'installer en CGI parce que sinon la fonction exec() ne marchait pas (erreur : Le système ne peut exécuter le programme demandé).
J'ai donc ensuite renommé mais fichier robot_functions.php et spider.php en .cgi et modifié les liens correspondants dans les fichiers où c'était nécessaire.
Et ça marche ! Il me reste juste un petit souci de conversion des accents mais c'est un moindre mal.

En espérant que cela puisse vous aider. (vous pouvez laisser un post sur developpez.com au cas z'où, j'y suis souvent)

Manuella
mleray is offline   Reply With Quote
Old 10-13-2004, 02:14 AM   #10
mleray
Orange Mole
 
Join Date: Sep 2004
Location: Nantes (44) FRANCE
Posts: 31
Precision :
I use
PHP 4.3.3
MySQL 4.0.15
Apache 1.3.27
on Windows XP installed with EasyPHP 1.7
My PHPDig version is 1.8.3
mleray is offline   Reply With Quote
Old 12-09-2004, 03:27 AM   #11
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
bump for xperienss...
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-09-2004, 11:26 PM   #12
xperienss
Green Mole
 
Join Date: Dec 2004
Location: Geneva Switzerland
Posts: 8
ohhhhhhhhhh thanx a lot @ Charter for bumping this post.

----

Ce message va Ã* Mleray
Apparement nous avons les mêmes configuration (WinXP, easyPHP 1.7,...)
Pour le moment j'ai réussi a faire marcher l'indexation de pdf avec Xpdf/pdftotext.exe v3.
Mais pour ce qui est de catdoc et xls2csv, je n'arrive toujours pas Ã* indexer les fichiers.

Tu disais que tu avais trouvé la solution... alors si tu peux m'aider car cela fait 1 semaine qur je galère en essayant toutes les configs possibles.
Merci d'avance (si tu reçois ce message)

----

Well, as soon as i ll got everything working, i ll post a topic with all explanations to install phpdig/catdoc/xpdf-pdftotext on WinXP/EasyPHP 1.7...

I am sure this would help lots of people.

Xperienss
xperienss is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
help where I find External Binaries the pdf xls doc gioducati External Binaries 0 08-12-2006 12:28 AM
index only *.doc files ? ipguy Troubleshooting 1 01-16-2006 04:45 PM
xls doc pdf with windows sktest External Binaries 1 02-09-2004 10:47 AM
indexation pdf doc et xls yoann Mod Submissions 0 09-26-2003 08:49 AM


All times are GMT -8. The time now is 07:23 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.