PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   Word and Excel converted but not indexed! (http://www.phpdig.net/forum/showthread.php?t=1448)

Topaz 10-12-2004 08:47 PM

Word and Excel converted but not indexed!
 
Hello

Now we are getting to my last problem (I do hope so at least ;) ).

It seems, that the spider indexes my word and excel-files, but they cannot be searched. They do not appear in my list of indexed documents. If I try to parse the documents on the commandline with

Code:

/usr/local/bin/catdoc -s 8859-1 test.doc
I have no problems.

And the spider itself creates a file in /admin/temp/ with correct content. So it parses it flawlessly, but it seems to write nothing into the database. I search the table 'spider' without success. Indexing PDFs is not a problem.

I tried different mime-settings in 'robot_functions.php' (application.msword - according to 'mime.conf' from apache) but with no luck.

I use the latest version of PHPDig 1.8.3, PHP 4.3.0, MySQL 3.23.49 and Apache 1.3.24 on a Redhat 7.2 (Enigma).

Thank you very much for kind help

Regards

Topaz

mleray 10-13-2004 01:10 AM

Take a look at the External Binaries Forum...
I hope you'll find a solution here.

Topaz 10-14-2004 03:28 AM

Quote:

Originally Posted by mleray
Take a look at the External Binaries Forum...
I hope you'll find a solution here.

Malheuresement ça ne marche pas.

I tried everything. Followed the instructions on http://www.phpdig.net/forum/showthread.php?t=799. My php.ini settings are fine. I also copied all the debugging code and got the following:


Code:

SITE : http://www.vips.ch/
Ausgeschlossene Pfade :
- administration/
- cgi-bin/
- css/
- db/
- flash/
- icongraphics/
- images/
- images_nav/
- scripts/
- search/
- stuff/
- de/login/
- fr/login/


Is result test http an array: 1
What is result test http status: HTML
Relative Path: ../admin/temp/

Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: /usr/local/bin/pdftotext
Does parse pdf exist: 1
Is parse pdf executable: 1
 1:http://www.vips.ch/test.php
(Zeit : 00:00:04)
+ + +
 2: <http://www.vips.ch/test.php> Wurde gerade indiziert
(Zeit : 00:00:07)

Level 1...


Is result test http an array: 1
What is result test http status: MSWORD
Relative Path: ../admin/temp/

Is result test an array: 1
What is result test status: MSWORD
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: /usr/local/bin/pdftotext
Does parse pdf exist: 1
Is parse pdf executable: 1

Command is: /usr/local/bin/catdoc -s 8859-1 ../admin/temp/66114322.tmp
Result contains: Array ( [0] => BESTELL-FORMULAR [1] => [2] => Medikamentenpackung/Broschüre "Behandlungserfolge" [3] => [4] => Die Broschüre ist ab 5. Mai 2003 lieferbar. [5] => [6] => Lieferung bis spätestens: [7] => ... )
Return value is: 0

3:http://www.vips.ch/test.doc
(Zeit : 00:00:13)



Is result test http an array: 1
What is result test http status: PDF
Relative Path: ../admin/temp/

Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: /usr/local/bin/pdftotext
Does parse pdf exist: 1
Is parse pdf executable: 1

Command is: /usr/local/bin/pdftotext ../admin/temp/38538542.tmp
Result contains: Array ( )
Return value is: 0

 4:http://www.vips.ch/test.pdf
(Zeit : 00:00:16)


Is result test http an array: 1
What is result test http status: MSEXCEL
Relative Path: ../admin/temp/

Is result test an array: 1
What is result test status: MSEXCEL
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: /usr/local/bin/pdftotext
Does parse pdf exist: 1
Is parse pdf executable: 1

Command is: /usr/local/bin/xls2csv ../admin/temp/57661852.tmp
Result contains: Array ( [0] => "Schritte","Beschreibung" [1] => , [2] => "1","Produktname eingeben" [3] => "2","Darreichungsformen und Packungen eingeben" [4] => "3","BAG Nummer ein... )
Return value is: 0

5:http://www.vips.ch/test.xls
(Zeit : 00:00:20)
Kein Link in der temporäreren Tabelle

I snipped the contents of the documents, but as you can see, the documents get converted but nothing is put into the database! How come?

Thanks for any further help.

Topaz

mleray 10-14-2004 06:21 AM

What are your options in config file here :
Quote:

//---------EXTERNAL TOOLS EXTENSIONS
// if external binary is not STDOUT or different extension is needed
// for example, use '.txt' if external binary writes to filename.txt
define('PHPDIG_MSWORD_EXTENSION','');
define('PHPDIG_PDF_EXTENSION','.txt');
define('PHPDIG_MSEXCEL_EXTENSION','');
define('PHPDIG_MSPOWERPOINT_EXTENSION','');
I put .txt for all tools but it's necessary only for pdf (using pdftotext)
--------------------------------------------------------------------------
Qu'as-tu mis dans les options du fichier de configuration ici :
Quote:

//---------EXTERNAL TOOLS EXTENSIONS
// if external binary is not STDOUT or different extension is needed
// for example, use '.txt' if external binary writes to filename.txt
define('PHPDIG_MSWORD_EXTENSION','');
define('PHPDIG_PDF_EXTENSION','.txt');
define('PHPDIG_MSEXCEL_EXTENSION','');
define('PHPDIG_MSPOWERPOINT_EXTENSION','');
J'avais mis .txt pour tout et ça convertissait bien les fichiers mais sans indexer.

Topaz 10-15-2004 02:54 AM

Quote:

Originally Posted by mleray
What are your options in config file here :

I put .txt for all tools but it's necessary only for pdf (using pdftotext)
--------------------------------------------------------------------------
Qu'as-tu mis dans les options du fichier de configuration ici :

J'avais mis .txt pour tout et ça convertissait bien les fichiers mais sans indexer.

AHHHHHHHHH, its unbelievable.

It's true. I just had to remove this stupid suffix! Now it works flawlessly. Life can be cruel to fools like me.

Merci beaucoup pour le tipp. Si tu es en Suisse un bel jour, je t'invite pour une fondue :-).

I would suggest to add that to the external binaries README.

Topaz

vinyl-junkie 10-15-2004 03:24 AM

Quote:

Originally Posted by Topaz
I would suggest to add that to the external binaries README.

What?! Read the directions FIRST? What a novel concept! :D ;)

Topaz 10-15-2004 03:36 AM

Quote:

Originally Posted by vinyl-junkie
What?! Read the directions FIRST? What a novel concept! :D ;)

Well, is it written somewhere? I probably read through all the manuals, readmes and threads I could find. If it can be found somewhere I'll definitely need a vacation :-).

Topaz

vinyl-junkie 10-15-2004 04:18 AM

Quote:

Originally Posted by Topaz
Well, is it written somewhere? I probably read through all the manuals, readmes and threads I could find. If it can be found somewhere I'll definitely need a vacation :-).

Topaz

Oops! You're right. I thought I was just being funny, and that you meant that the info was in the manual and you didn't read it. Sorry about that. I agree, that should definitely be in the documentation.

mleray 10-15-2004 04:49 AM

I am charmed to have been able to help someone.

Je suis ravie d'avoir pu aider quelqu'un :)

Topaz 10-15-2004 01:40 PM

Quote:

Originally Posted by vinyl-junkie
Sorry about that. I agree, that should definitely be in the documentation.

No problem, I was concerned about myself :-).


All times are GMT -8. The time now is 04:40 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.