|
07-09-2004, 10:22 AM | #1 |
Orange Mole
Join Date: Apr 2004
Location: Nancy (54)
Posts: 38
|
no msword to txt parsing
hello
(i've 1.8.1 and 1.8.0 version on my site) i made a simple test page as <a href="http://quito.citipo.fr/modules/documents/rep2/DocUtil.doc">Docutilisateur</a><br> -- i indexe it ... a temporary file is created in admin/temp/xxxx.tmp for this .doc but it seems that this file is not parse as txt file with phpdig --- i don't know why ??? thanks |
07-09-2004, 11:15 AM | #2 |
Orange Mole
Join Date: Apr 2004
Location: Nancy (54)
Posts: 38
|
no msword indexing
hello
i continue my test. i put an echo at line 461 from spider.php script. my script to index is : test.php <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>Sans titre</title> </head> <body> <a href="http://quito.citipro.fr/modules/documents/rep2/DocUtil.doc">Docutilisateur</a><br> </body> </html> the result is: SITE : http://quito.citipro.fr/ Exclude paths : - @NONE@ Resource id #5**../admin/temp/81475511.tmp**245**15******** test.php**HTML**20040709211142**20040709211125**Array** 1:http://quito.citipro.fr/test.php (time : 00:00:22) + level 1... Resource id #5**0**0**15******modules/documents/rep2/** DocUtil.doc**MSWORD**20040709211152**20040708082318**** 2:http://quito.citipro.fr/modules/docu...p2/DocUtil.doc (time : 00:00:32) No link in temporary table there is no temporary file for msword ... thanks |
07-09-2004, 11:20 AM | #3 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. There is a checklist here to help with troubleshooting.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
07-10-2004, 01:23 PM | #4 |
Orange Mole
Join Date: Apr 2004
Location: Nancy (54)
Posts: 38
|
always catdoc
hello
thanks you for posting thread- i check your list and all your request are good - but ... when i indexe my .doc, response is: Command is: /home/mutualiseweb/catdoc-0.93.3/catdoc -s 8859-1 ../admin/temp/44148632.tmp Result contains: Array ( ) Return value is: 127 but nothing is record in the database i try a command line with catdoc on my linux OS, catdoc runs well my MSWORD what happend ?? Are there frenchies users in this forum ?? |
07-10-2004, 01:33 PM | #5 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. In robot_functions.php find:
PHP Code:
PHP Code:
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
07-10-2004, 01:44 PM | #6 |
Orange Mole
Join Date: Apr 2004
Location: Nancy (54)
Posts: 38
|
hi (23:44 in france)
here response with the code modification: Command is: /home/mutualiseweb/catdoc-0.93.3 -s 8859-1 ../admin/temp/38346732.tmp 2>&1 Result contains: Array ( [0] => sh: line 1: /home/mutualiseweb/catdoc-0.93.3: is a directory ) Return value is: 126 strange: when i use a command line /home/mutualiseweb/catdoc -s 8859-1 mymsword.doc, catdoc runs - but when i change define('PHPDIG_PARSE_MSWORD','/home/mutualiseweb/catdoc-0.93.3'); with define('PHPDIG_PARSE_MSWORD','/home/mutualiseweb/catdoc);, phpdig not recognize my msword file |
07-10-2004, 01:47 PM | #7 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Does this work?
PHP Code:
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
07-10-2004, 01:49 PM | #8 |
Orange Mole
Join Date: Apr 2004
Location: Nancy (54)
Posts: 38
|
lol, i try this before your post
No! doesn't work |
07-10-2004, 01:51 PM | #9 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. What does
PHP Code:
PHP Code:
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
07-10-2004, 01:53 PM | #10 |
Orange Mole
Join Date: Apr 2004
Location: Nancy (54)
Posts: 38
|
Command is: /home/mutualiseweb/catdoc-0.93.3/catdoc -s 8859-1 ../admin/temp/39511712.tmp 2>&1
Result contains: Array ( [0] => sh: line 1: /home/mutualiseweb/catdoc-0.93.3/catdoc: No such file or directory ) Return value is: 127 |
07-10-2004, 01:56 PM | #11 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. What does
PHP Code:
PHP Code:
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
07-10-2004, 02:02 PM | #12 |
Orange Mole
Join Date: Apr 2004
Location: Nancy (54)
Posts: 38
|
OK !!
all is my fault my catdoc is under /home/mutualiseweb/catdoc-0.93.3/src/ MY GOD a little question with .pdf files: is it necessary to install GHOST ?? )) sorry |
07-10-2004, 02:03 PM | #13 |
Orange Mole
Join Date: Apr 2004
Location: Nancy (54)
Posts: 38
|
THANKS LOT
|
07-10-2004, 02:11 PM | #14 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
LOL, paths and permissions.
For PDFs perhaps try getting pdftotext already compiled. Directions are in this post.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
catdoc MSWORD binary won't execute | frodo | External Binaries | 0 | 06-22-2006 01:31 PM |
Student who try to works with Msword! | davids211082 | External Binaries | 1 | 03-15-2005 09:09 AM |
Index MSWORD But No search result | wessam | External Binaries | 29 | 08-22-2004 03:29 PM |
robots.txt versus robotsxx.txt | Charter | IPs, SEs, & UAs | 0 | 03-11-2004 06:00 PM |
Problems with URL parsing | apdejong | Troubleshooting | 6 | 11-20-2003 02:35 AM |