|
06-07-2005, 11:39 AM | #1 |
Former Member
Join Date: May 2005
Posts: 5
|
spider hangs on indexing pdf (pstotext)
hi there,
i try to use phpdig for the first time... i read a lot of threads about problems with pstotext, and tried several hints, but still can't get it work... my system: ------------------------ -FreeBSD 4.10 -PHP Version 4.3.1 -PHPDIG_VERSION 1.8.7 ------------------------ from command line pstotext seems to work correctly (it outputs the file content on STDOUT as expected) the paths in config.php are ok: ------------------------ define('PHPDIG_INDEX_PDF',true); define('PHPDIG_PARSE_PDF','/usr/local/bin/pstotext'); define('PHPDIG_OPTION_PDF','-cork'); ------------------------ i tweaked the spider.php and robot_functions.php as mentoined somewhere. this are the outputs: ------------------------ Is result test http an array: 1 What is result test http status: PDF Is result test an array: 1 What is result test status: PDF Use is executable is set to: 1 Index the pdf is set to: 1 Parse the pdf is set to: /usr/local/bin/pstotext Does parse pdf exist: 1 Is parse pdf executable: 1 ------------------------ ... just after printing that, the spider hangs without any error message... can anyone help? |
06-07-2005, 12:29 PM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Okay, that all looks good, so remove the code to print those outputs, and instead, in robot_functions.php find:
Code:
$command = PHPDIG_PARSE_PDF.' '.PHPDIG_OPTION_PDF.' '.$tempfile2; Code:
$command = PHPDIG_PARSE_PDF.' '.PHPDIG_OPTION_PDF.' '.$tempfile2.' 2>&1'; Also, if the PDFs were not from dvips, then try the following: Code:
define('PHPDIG_OPTION_PDF',''); Code:
define('PHPDIG_PDF_EXTENSION','');
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
06-10-2005, 08:31 AM | #3 |
Former Member
Join Date: May 2005
Posts: 5
|
still hanging !
hi carter,
thanks for your reply! i tried your advises, but without success... the spider still hangs on indexing the pdf. this is the last the spider prints out: ---------- Is result test http an array: 1 What is result test http status: PDF ---------- this are my settings: ---------- define('PHPDIG_INDEX_PDF',true); define('PHPDIG_PARSE_PDF','/usr/local/bin/pstotext'); define('PHPDIG_OPTION_PDF',''); define('PHPDIG_PDF_EXTENSION',''); ---------- is saw that the file-permissons to '/usr/local/bin/pstotext' are all set to 755 except the file itself wich has 555 ... could that be a problem? since i am not adminsitrator of the server (it's a commercional provider) i'm not be able to change any of the file-permissions... *thanks for further support! |
06-10-2005, 11:11 AM | #4 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
As you cannot change permission on pstotext, see if your host will change the permission or try pdftotext instead. There are instructions for pdftotext here.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
06-13-2005, 01:31 PM | #5 |
Former Member
Join Date: May 2005
Posts: 5
|
hi charter,
there was a problem with 'allow_url_fopen', now it still dont indexes pdf but the spider don't hangs anymore (still trying with 'pstotext') ... here's the output: ----------------------- Is result test http an array: 1 What is result test http status: PDF Is result test an array: 1 What is result test status: PDF Use is executable is set to: 1 Index the pdf is set to: 1 Parse the pdf is set to: /usr/local/bin/pstotext Does parse pdf exist: 1 Is parse pdf executable: 1 Command is: /usr/local/bin/pstotext ../admin/temp/66912182.tmp 2>&1 Result contains: Array ( [0] => gs: not found ) Return value is: 3 ----------------------- whats means 'gs: not found' ? *thanks for your support (... im now going to try 'pdftotext') |
06-13-2005, 01:49 PM | #6 |
Former Member
Join Date: May 2005
Posts: 5
|
pdftotext error
hi again,
with 'pdftotext' it dont work either (i use the linux-binary on the freeBSD host...) config: --------------------------- define('PHPDIG_INDEX_PDF',true); define('PHPDIG_PARSE_PDF','/home/local/bin/pdftotext'); define('PHPDIG_OPTION_PDF',''); define('PHPDIG_PDF_EXTENSION','.txt'); --------------------------- output: --------------------------- Is result test http an array: 1 What is result test http status: PDF Is result test an array: 1 What is result test status: PDF Use is executable is set to: 1 Index the pdf is set to: 1 Parse the pdf is set to: /home/local/bin/pdftotext Does parse pdf exist: 1 Is parse pdf executable: 1 Command is: /home/ekifch/bin/pdftotext ../admin/temp/89121942.tmp 2>&1 Result contains: Array ( [0] => Abort trap ) Return value is: 134 --------------------------- *any idea? |
06-13-2005, 07:57 PM | #7 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
> Result contains: Array ( [0] => gs: not found )
That probably means that Ghostscript cannot be found. > Result contains: Array ( [0] => Abort trap ) That might be a memory issue. Try pdftotext on a small PDF file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
06-15-2005, 06:57 AM | #8 |
Former Member
Join Date: May 2005
Posts: 5
|
yeah it works now
thanks to your support, some help from my server-admin and lots of hours searching for a solution i finnaly got it work!
the problem was that somehow the 'pstotext' did not find the 'ghostscript'-library when run per web-php-script. i had to add "export PATH=$PATH:my_path_to_lib; " to the exec command in 'robot_functions.php'... here's the full change-instruction in case anyone runs into the same problem: in config.inc (some where near 'EXTERNAL TOOLS SETUP') add: PHP Code:
PHP Code:
PHP Code:
*cheers* |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Phpdig hangs when asked to spider any url using 1.83 | steviec | Troubleshooting | 0 | 02-15-2006 01:27 AM |
phpdig spider hangs (a powerpoint file problem) | davideyre | Troubleshooting | 1 | 03-29-2004 01:35 PM |
Indexing hangs, nothing in db | WunderStump | Troubleshooting | 6 | 02-25-2004 11:36 AM |
pdf indexing with pstotext | zevince | External Binaries | 22 | 01-12-2004 05:51 AM |
PDF indexing | lelandv | External Binaries | 15 | 12-08-2003 05:23 PM |