|
07-13-2006, 05:57 PM | #1 |
Green Mole
Join Date: Oct 2005
Posts: 12
|
pdftotext issue
Hi,
I am trying to get pdftotext to work with phpdig. I have followed the instructions in the sticky at the top of the forum section and the output I am getting is this: Is result test http an array: 1 What is result test http status: HTML Is result test an array: 1 What is result test status: HTML Use is executable is set to: 1 Index the pdf is set to: 1 Parse the pdf is set to: D:\Internet\WWWROOT\anmc\Xpdf\pdftotext.exe Does parse pdf exist: 1 This is all I get if I try to re-index the whole site or if I try to reindex only a sub-section of the site. As you can see by the path, I am unfortunately forced to be installing on a Windows box. I have tried out pdftotext via the command line and it appears to work... It makes a text file in the Xpdf dir that contains the expected text from the PDF I gave it. I've searched the forum repeatedly, but nothing I have yet found has solved my problem, any help would be greatly appreciated. |
07-13-2006, 06:22 PM | #2 |
Green Mole
Join Date: Jul 2006
Posts: 9
|
May I know your system configuration?
|
07-13-2006, 07:49 PM | #3 |
Green Mole
Join Date: Oct 2005
Posts: 12
|
IIS 5 with PHP Version 4.3.1 (CGI I think)
MySQL 3.23.52 Not sure what else is relevant...? |
07-13-2006, 08:05 PM | #4 |
Green Mole
Join Date: Oct 2005
Posts: 12
|
Erm... PhpDig v.1.8.8, that's probably relevant, hey .
Is there a way to edit posts on this forum by the way? Can't seem to see the option... Or am I just having a blonde day? |
07-14-2006, 01:21 AM | #5 |
Green Mole
Join Date: Oct 2005
Posts: 12
|
Well after much stuffing around, I have now installed PHP 5. The is_executable() function not being available for PHP 4 with Windows as I have found out (only took me like 4 hours to get that all worked out! ). So I now am getting the output as below:
Is result test http an array: 1 What is result test http status: HTML Is result test an array: 1 What is result test status: HTML Use is executable is set to: 1 Index the pdf is set to: 1 Parse the pdf is set to: D:\Internet\WWWROOT\anmc\Xpdf\pdftotext.exe Does parse pdf exist: 1 Is parse pdf executable: 1 Still no PDF indexing action to be seen. Any help much appreciated, I think I'm going to now get as far away from the computer as possible before I smash it with a hammer. |
07-14-2006, 09:21 PM | #6 |
Green Mole
Join Date: Oct 2005
Posts: 12
|
So coming back to my problem with fresh eyes, it looks like the extra lines in robot_functions.php:
Code:
echo "<br>Command is: " . $command . "<br>"; echo "Result contains: "; print_r($result); echo "<br>Return value is: " . $retval . "<br><br>"; Any help, any help at all would be greatly appreciated at this point. If I can't get PDF indexing working with phpdig then I'll be forced to use some other search engine and I really like phpdig! I'm really not any kind of PHP guru at all and I have a suspicion that perhaps my problem stems from the fact the I am forced to be setting phpdig and pdftotext up on a Windows system with IIS... Perhaps some kind of permission problem with the pdftotext executable and the php exec() function, I don't know... |
07-14-2006, 11:40 PM | #7 |
Green Mole
Join Date: Oct 2005
Posts: 12
|
Well... Following on down the path from my last post, $result_test['status'] was not being set as 'PDF' so the switch statement was not in turn running the 'PDF' case. So I wanted to see what would happen if I told phpdig index the full address to a particular PDF.
Is result test http an array: 1 What is result test http status: PDF Code:
Is result test an array: 1 What is result test status: PDF Use is executable is set to: 1 Index the pdf is set to: 1 Parse the pdf is set to: D:\Internet\WWWROOT\anmc\Xpdf\pdftotext.exe Does parse pdf exist: 1 Is parse pdf executable: 1 Command is: D:\Internet\WWWROOT\anmc\Xpdf\pdftotext.exe ../admin/temp/67264762.tmp 2>&1 Result contains: Array ( ) Return value is: 0 5:http://XXX/docs/Modified_Form_A_0607.pdf (time : 00:00:11) But all's well that ends well I guess. Too bad I can't rename this thread to the jonny vs. jonny thread... How are you doing now jonny? I'm doing well thanks, jonny... That's great to hear, jonny. Take care. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Spidering issue | cefiro | How-to Forum | 0 | 02-28-2005 09:01 AM |
Indexing Issue | tajmahal | Troubleshooting | 8 | 02-19-2005 11:03 AM |
config issue | baskamer | Troubleshooting | 2 | 12-18-2004 12:33 PM |
Installation issue... again | jinx | Script Installation | 1 | 06-14-2004 08:31 PM |
pstotext issue | killer27 | External Binaries | 7 | 05-12-2004 01:28 PM |