PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   External Binaries (http://www.phpdig.net/forum/forumdisplay.php?f=36)
-   -   pstotext problem. (http://www.phpdig.net/forum/showthread.php?t=795)

DoWn 04-09-2004 06:04 AM

pstotext problem.
 
Hi. Again a problem trying to index pdf files.

First : the environment

Debian linux running Apache 1.3.26 . PHP 4.1.2.

PHP dig 1.8.0

Succesfully installed pstotext.

In console mode, pstotext runs very well :

The command 'pstotext file.pdf ' display the text contained in the pdf on the screen.


I also tried to redirect output of pstotext in a text file successfully.

phpdig config :

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/usr/bin/pstotext');
define('PHPDIG_OPTION_PDF','-cork');

verified (twice) that pstotext is in /usr/bin/ directory

The trouble is the following :

phpdig seems to read correctly pdf files but doensn't index them at all.

help me please.

Charter 04-09-2004 07:24 AM

Hi. Are the directories to pstotext and the pstotext file itself set to 755 permissions?

DoWn 04-10-2004 01:14 AM

Hi.

Thank you for answering so quiclky.

The directories and pstotext file itself are set to 755 rights (rwxr-xr-x)

phpdig reads the pdf files but doesn't index them.

:(

Charter 04-10-2004 12:44 PM

Hi. Maybe something in this thread will help.

DoWn 04-12-2004 11:19 PM

Hi.

Thank you for your help.

I patched spider.php and robot_functions.php and it seems to be working now.

Phpdig now index some of my pdf.

I still have some problems when trying to index a directory containing only pdf files, but i'm still searching.

Thank you again :)

Charter 04-13-2004 06:32 PM

>> I still have some problems when trying to index a directory containing only pdf files, but i'm still searching.

Hi. Are there links to all these PDF files? As PhpDig follows links, it won't index a standalone directory of files. Also, it seems some PDF files just take too much memory. See this thread for more details.


All times are GMT -8. The time now is 02:15 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.