PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 10-12-2004, 09:47 PM   #1
Topaz
Green Mole
 
Topaz's Avatar
 
Join Date: Oct 2004
Posts: 11
Word and Excel converted but not indexed!

Hello

Now we are getting to my last problem (I do hope so at least ).

It seems, that the spider indexes my word and excel-files, but they cannot be searched. They do not appear in my list of indexed documents. If I try to parse the documents on the commandline with

Code:
/usr/local/bin/catdoc -s 8859-1 test.doc
I have no problems.

And the spider itself creates a file in /admin/temp/ with correct content. So it parses it flawlessly, but it seems to write nothing into the database. I search the table 'spider' without success. Indexing PDFs is not a problem.

I tried different mime-settings in 'robot_functions.php' (application.msword - according to 'mime.conf' from apache) but with no luck.

I use the latest version of PHPDig 1.8.3, PHP 4.3.0, MySQL 3.23.49 and Apache 1.3.24 on a Redhat 7.2 (Enigma).

Thank you very much for kind help

Regards

Topaz
Topaz is offline   Reply With Quote
Old 10-13-2004, 02:10 AM   #2
mleray
Orange Mole
 
Join Date: Sep 2004
Location: Nantes (44) FRANCE
Posts: 31
Take a look at the External Binaries Forum...
I hope you'll find a solution here.
mleray is offline   Reply With Quote
Old 10-14-2004, 04:28 AM   #3
Topaz
Green Mole
 
Topaz's Avatar
 
Join Date: Oct 2004
Posts: 11
Quote:
Originally Posted by mleray
Take a look at the External Binaries Forum...
I hope you'll find a solution here.
Malheuresement ça ne marche pas.

I tried everything. Followed the instructions on http://www.phpdig.net/forum/showthread.php?t=799. My php.ini settings are fine. I also copied all the debugging code and got the following:


Code:
SITE : http://www.vips.ch/
Ausgeschlossene Pfade :
- administration/
- cgi-bin/
- css/
- db/
- flash/
- icongraphics/
- images/
- images_nav/
- scripts/
- search/
- stuff/
- de/login/
- fr/login/


Is result test http an array: 1
What is result test http status: HTML
Relative Path: ../admin/temp/

Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: /usr/local/bin/pdftotext
Does parse pdf exist: 1
Is parse pdf executable: 1
 1:http://www.vips.ch/test.php
(Zeit : 00:00:04)
+ + + 
 2: <http://www.vips.ch/test.php> Wurde gerade indiziert
(Zeit : 00:00:07)

Level 1...


Is result test http an array: 1
What is result test http status: MSWORD
Relative Path: ../admin/temp/

Is result test an array: 1
What is result test status: MSWORD
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: /usr/local/bin/pdftotext
Does parse pdf exist: 1
Is parse pdf executable: 1

Command is: /usr/local/bin/catdoc -s 8859-1 ../admin/temp/66114322.tmp
Result contains: Array ( [0] => BESTELL-FORMULAR [1] => [2] => Medikamentenpackung/Broschüre "Behandlungserfolge" [3] => [4] => Die Broschüre ist ab 5. Mai 2003 lieferbar. [5] => [6] => Lieferung bis spätestens: [7] => ... ) 
Return value is: 0

3:http://www.vips.ch/test.doc
(Zeit : 00:00:13)



Is result test http an array: 1
What is result test http status: PDF
Relative Path: ../admin/temp/

Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: /usr/local/bin/pdftotext
Does parse pdf exist: 1
Is parse pdf executable: 1

Command is: /usr/local/bin/pdftotext ../admin/temp/38538542.tmp
Result contains: Array ( ) 
Return value is: 0

 4:http://www.vips.ch/test.pdf
(Zeit : 00:00:16)


Is result test http an array: 1
What is result test http status: MSEXCEL
Relative Path: ../admin/temp/

Is result test an array: 1
What is result test status: MSEXCEL
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: /usr/local/bin/pdftotext
Does parse pdf exist: 1
Is parse pdf executable: 1

Command is: /usr/local/bin/xls2csv ../admin/temp/57661852.tmp
Result contains: Array ( [0] => "Schritte","Beschreibung" [1] => , [2] => "1","Produktname eingeben" [3] => "2","Darreichungsformen und Packungen eingeben" [4] => "3","BAG Nummer ein... ) 
Return value is: 0

5:http://www.vips.ch/test.xls
(Zeit : 00:00:20)
Kein Link in der temporäreren Tabelle
I snipped the contents of the documents, but as you can see, the documents get converted but nothing is put into the database! How come?

Thanks for any further help.

Topaz
Topaz is offline   Reply With Quote
Old 10-14-2004, 07:21 AM   #4
mleray
Orange Mole
 
Join Date: Sep 2004
Location: Nantes (44) FRANCE
Posts: 31
What are your options in config file here :
Quote:
//---------EXTERNAL TOOLS EXTENSIONS
// if external binary is not STDOUT or different extension is needed
// for example, use '.txt' if external binary writes to filename.txt
define('PHPDIG_MSWORD_EXTENSION','');
define('PHPDIG_PDF_EXTENSION','.txt');
define('PHPDIG_MSEXCEL_EXTENSION','');
define('PHPDIG_MSPOWERPOINT_EXTENSION','');
I put .txt for all tools but it's necessary only for pdf (using pdftotext)
--------------------------------------------------------------------------
Qu'as-tu mis dans les options du fichier de configuration ici :
Quote:
//---------EXTERNAL TOOLS EXTENSIONS
// if external binary is not STDOUT or different extension is needed
// for example, use '.txt' if external binary writes to filename.txt
define('PHPDIG_MSWORD_EXTENSION','');
define('PHPDIG_PDF_EXTENSION','.txt');
define('PHPDIG_MSEXCEL_EXTENSION','');
define('PHPDIG_MSPOWERPOINT_EXTENSION','');
J'avais mis .txt pour tout et ça convertissait bien les fichiers mais sans indexer.
mleray is offline   Reply With Quote
Old 10-15-2004, 03:54 AM   #5
Topaz
Green Mole
 
Topaz's Avatar
 
Join Date: Oct 2004
Posts: 11
Quote:
Originally Posted by mleray
What are your options in config file here :

I put .txt for all tools but it's necessary only for pdf (using pdftotext)
--------------------------------------------------------------------------
Qu'as-tu mis dans les options du fichier de configuration ici :

J'avais mis .txt pour tout et ça convertissait bien les fichiers mais sans indexer.
AHHHHHHHHH, its unbelievable.

It's true. I just had to remove this stupid suffix! Now it works flawlessly. Life can be cruel to fools like me.

Merci beaucoup pour le tipp. Si tu es en Suisse un bel jour, je t'invite pour une fondue :-).

I would suggest to add that to the external binaries README.

Topaz
Topaz is offline   Reply With Quote
Old 10-15-2004, 04:24 AM   #6
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
Quote:
Originally Posted by Topaz
I would suggest to add that to the external binaries README.
What?! Read the directions FIRST? What a novel concept!
vinyl-junkie is offline   Reply With Quote
Old 10-15-2004, 04:36 AM   #7
Topaz
Green Mole
 
Topaz's Avatar
 
Join Date: Oct 2004
Posts: 11
Quote:
Originally Posted by vinyl-junkie
What?! Read the directions FIRST? What a novel concept!
Well, is it written somewhere? I probably read through all the manuals, readmes and threads I could find. If it can be found somewhere I'll definitely need a vacation :-).

Topaz
Topaz is offline   Reply With Quote
Old 10-15-2004, 05:18 AM   #8
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
Quote:
Originally Posted by Topaz
Well, is it written somewhere? I probably read through all the manuals, readmes and threads I could find. If it can be found somewhere I'll definitely need a vacation :-).

Topaz
Oops! You're right. I thought I was just being funny, and that you meant that the info was in the manual and you didn't read it. Sorry about that. I agree, that should definitely be in the documentation.
vinyl-junkie is offline   Reply With Quote
Old 10-15-2004, 05:49 AM   #9
mleray
Orange Mole
 
Join Date: Sep 2004
Location: Nantes (44) FRANCE
Posts: 31
I am charmed to have been able to help someone.

Je suis ravie d'avoir pu aider quelqu'un
mleray is offline   Reply With Quote
Old 10-15-2004, 02:40 PM   #10
Topaz
Green Mole
 
Topaz's Avatar
 
Join Date: Oct 2004
Posts: 11
Quote:
Originally Posted by vinyl-junkie
Sorry about that. I agree, that should definitely be in the documentation.
No problem, I was concerned about myself :-).
Topaz is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Plus character(+) converted to (%20) in urls raymerica Troubleshooting 2 05-31-2006 01:19 PM
Temp Spider table Converted to HEAP table GunMuse Mod Requests 0 04-22-2005 02:25 PM
Meta Robots = NoIndex, or already indexed : No content indexed jerrywin5 How-to Forum 2 04-06-2005 03:50 PM
converted from html pages to php pages now no pages will index!!! help!! bigals Troubleshooting 24 04-01-2004 10:34 AM
Can't index word or excel files pascal622 External Binaries 1 01-20-2004 10:05 AM


All times are GMT -8. The time now is 09:44 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.