|
10-12-2004, 09:47 PM | #1 |
Green Mole
Join Date: Oct 2004
Posts: 11
|
Word and Excel converted but not indexed!
Hello
Now we are getting to my last problem (I do hope so at least ). It seems, that the spider indexes my word and excel-files, but they cannot be searched. They do not appear in my list of indexed documents. If I try to parse the documents on the commandline with Code:
/usr/local/bin/catdoc -s 8859-1 test.doc And the spider itself creates a file in /admin/temp/ with correct content. So it parses it flawlessly, but it seems to write nothing into the database. I search the table 'spider' without success. Indexing PDFs is not a problem. I tried different mime-settings in 'robot_functions.php' (application.msword - according to 'mime.conf' from apache) but with no luck. I use the latest version of PHPDig 1.8.3, PHP 4.3.0, MySQL 3.23.49 and Apache 1.3.24 on a Redhat 7.2 (Enigma). Thank you very much for kind help Regards Topaz |
10-13-2004, 02:10 AM | #2 |
Orange Mole
Join Date: Sep 2004
Location: Nantes (44) FRANCE
Posts: 31
|
Take a look at the External Binaries Forum...
I hope you'll find a solution here. |
10-14-2004, 04:28 AM | #3 | |
Green Mole
Join Date: Oct 2004
Posts: 11
|
Quote:
I tried everything. Followed the instructions on http://www.phpdig.net/forum/showthread.php?t=799. My php.ini settings are fine. I also copied all the debugging code and got the following: Code:
SITE : http://www.vips.ch/ Ausgeschlossene Pfade : - administration/ - cgi-bin/ - css/ - db/ - flash/ - icongraphics/ - images/ - images_nav/ - scripts/ - search/ - stuff/ - de/login/ - fr/login/ Is result test http an array: 1 What is result test http status: HTML Relative Path: ../admin/temp/ Is result test an array: 1 What is result test status: HTML Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: /usr/local/bin/pdftotext Does parse pdf exist: 1 Is parse pdf executable: 1 1:http://www.vips.ch/test.php (Zeit : 00:00:04) + + + 2: <http://www.vips.ch/test.php> Wurde gerade indiziert (Zeit : 00:00:07) Level 1... Is result test http an array: 1 What is result test http status: MSWORD Relative Path: ../admin/temp/ Is result test an array: 1 What is result test status: MSWORD Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: /usr/local/bin/pdftotext Does parse pdf exist: 1 Is parse pdf executable: 1 Command is: /usr/local/bin/catdoc -s 8859-1 ../admin/temp/66114322.tmp Result contains: Array ( [0] => BESTELL-FORMULAR [1] => [2] => Medikamentenpackung/Broschüre "Behandlungserfolge" [3] => [4] => Die Broschüre ist ab 5. Mai 2003 lieferbar. [5] => [6] => Lieferung bis spätestens: [7] => ... ) Return value is: 0 3:http://www.vips.ch/test.doc (Zeit : 00:00:13) Is result test http an array: 1 What is result test http status: PDF Relative Path: ../admin/temp/ Is result test an array: 1 What is result test status: PDF Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: /usr/local/bin/pdftotext Does parse pdf exist: 1 Is parse pdf executable: 1 Command is: /usr/local/bin/pdftotext ../admin/temp/38538542.tmp Result contains: Array ( ) Return value is: 0 4:http://www.vips.ch/test.pdf (Zeit : 00:00:16) Is result test http an array: 1 What is result test http status: MSEXCEL Relative Path: ../admin/temp/ Is result test an array: 1 What is result test status: MSEXCEL Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: /usr/local/bin/pdftotext Does parse pdf exist: 1 Is parse pdf executable: 1 Command is: /usr/local/bin/xls2csv ../admin/temp/57661852.tmp Result contains: Array ( [0] => "Schritte","Beschreibung" [1] => , [2] => "1","Produktname eingeben" [3] => "2","Darreichungsformen und Packungen eingeben" [4] => "3","BAG Nummer ein... ) Return value is: 0 5:http://www.vips.ch/test.xls (Zeit : 00:00:20) Kein Link in der temporäreren Tabelle Thanks for any further help. Topaz |
|
10-14-2004, 07:21 AM | #4 | ||
Orange Mole
Join Date: Sep 2004
Location: Nantes (44) FRANCE
Posts: 31
|
What are your options in config file here :
Quote:
-------------------------------------------------------------------------- Qu'as-tu mis dans les options du fichier de configuration ici : Quote:
|
||
10-15-2004, 03:54 AM | #5 | |
Green Mole
Join Date: Oct 2004
Posts: 11
|
Quote:
It's true. I just had to remove this stupid suffix! Now it works flawlessly. Life can be cruel to fools like me. Merci beaucoup pour le tipp. Si tu es en Suisse un bel jour, je t'invite pour une fondue :-). I would suggest to add that to the external binaries README. Topaz |
|
10-15-2004, 04:24 AM | #6 | |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Quote:
|
|
10-15-2004, 04:36 AM | #7 | |
Green Mole
Join Date: Oct 2004
Posts: 11
|
Quote:
Topaz |
|
10-15-2004, 05:18 AM | #8 | |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Quote:
|
|
10-15-2004, 05:49 AM | #9 |
Orange Mole
Join Date: Sep 2004
Location: Nantes (44) FRANCE
Posts: 31
|
I am charmed to have been able to help someone.
Je suis ravie d'avoir pu aider quelqu'un |
10-15-2004, 02:40 PM | #10 | |
Green Mole
Join Date: Oct 2004
Posts: 11
|
Quote:
|
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Plus character(+) converted to (%20) in urls | raymerica | Troubleshooting | 2 | 05-31-2006 01:19 PM |
Temp Spider table Converted to HEAP table | GunMuse | Mod Requests | 0 | 04-22-2005 02:25 PM |
Meta Robots = NoIndex, or already indexed : No content indexed | jerrywin5 | How-to Forum | 2 | 04-06-2005 03:50 PM |
converted from html pages to php pages now no pages will index!!! help!! | bigals | Troubleshooting | 24 | 04-01-2004 10:34 AM |
Can't index word or excel files | pascal622 | External Binaries | 1 | 01-20-2004 10:05 AM |