|
07-09-2004, 12:51 PM | #1 |
Green Mole
Join Date: Jul 2004
Location: Illnau, Switzerland, Europe
Posts: 9
|
I wrote a mod for indexing pdf without an external binary!!!
Hello people
I have written a modification, with which I now can index pdf-files. The special is: You don't need an external binary like ps2txt or another UNIX-tool. The mod sends the pdf to adobe, which it converts to html-code. After that, this code is indexed by phpDig. For more information, please visit my homepage <removed> |
07-09-2004, 07:11 PM | #2 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Is your robot_functions.php meant to completely replace the one that comes with phpdig? It's hard to tell since your site isn't in English.
|
07-10-2004, 01:42 AM | #3 |
Green Mole
Join Date: Jul 2004
Location: Illnau, Switzerland, Europe
Posts: 9
|
I had to change code at 4 or 5 positions in the already existing file robot_functions.
so, the easiest way is to replaceing this file (if you didnt made some changes in this file yourself, else make a backup!). In the header of the file, I have listened all changes, i made. The english part of it in my homepage will comming soon... (Or has anyone desire for doing that?) sorry for my bad english |
07-10-2004, 04:20 AM | #4 |
Green Mole
Join Date: Jul 2004
Location: Illnau, Switzerland, Europe
Posts: 9
|
Please download and use only the actual version from my site.
(The older version has a bug) I made it for the phpDig V1.8.1. It won't work with older version of phpDig. |
07-10-2004, 05:33 AM | #5 | |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. From the Adobe Terms of Use located here:
Quote:
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
|
07-10-2004, 07:35 AM | #6 |
Green Mole
Join Date: Jul 2004
Location: Illnau, Switzerland, Europe
Posts: 9
|
Hello charter
;( , sorry, i didn' read the terms of adobe. I was very happy to have a sollution for this sch*** pdf-problem. oh, i really hate adobe!!! because I can't install ps2txt or pdf2html at my webspace, i have to search annother sollution. could i send the pdf to annother server (of a friend or else) which converts it for me with pdf2html and sends then back to me? i have not much enought unix-experience, so i'm not sure. or know anybody a converter for pdf2txt written in perl (cgi)? annother sollution is, sending it once to adobe and then save it in a database, until its mtime changes. with this, I think adobe could nothing say!!! P.S. I really like phpDig, but without pdf-support, I could it use only half. greets CaCO3 |
07-10-2004, 09:01 AM | #8 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. At FooLabs is a mirror to PlanetMirror where you can find compiled versions of pdftotext.
Go to PlanetMirror and download xpdf-3.00-linux.tar.gz (assuming linux is your operating system). Unzip xpdf-3.00-linux.tar.gz and extract only the pdftotext file (it's already been compiled and is a binary file). FTP just the pdftotext file in binary mode to your account. Once the file is over, change its permission to rwxr-xr-x (755 permission). Now in the PhpDig config file, set the following: PHP Code:
PHP Code:
From the admin panel of PhpDig version 1.8.1, just type in the link to a PDF file, and set search depth to zero and set links per to one, to test pdftotext on the one PDF file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
07-10-2004, 11:26 AM | #9 |
Green Mole
Join Date: Jul 2004
Location: Illnau, Switzerland, Europe
Posts: 9
|
thanx for your tipps charter
I made it with explanations. (firstly i restored all files from phpdig to its originals) then i changed the config.php like you said. for the path, i used /home/ruinelli/public_html/cgi-bin in which I too moved the file pdftotext (1MB). But I think, in this dir I can't (don't have the permition) for executing binaries!!! then I executet the spider with http://testdomain.ruinelli.ch/gpl.pdf it spiders, but no keyword is putted in the database. ;( I think, the problem is that the file pdf2txt has to be in a bin-folder like /bin or /usr/local/bin to wich I don't have access. you can test it under: http://www.ruinelli.ch/phpdig/admin/index.php @vinyl-junkie: read the problem @: http://forums.devshed.com/archive/t-121054 Last edited by caco3; 07-10-2004 at 11:28 AM. |
07-10-2004, 11:38 AM | #10 | |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Quote:
|
|
07-10-2004, 11:44 AM | #11 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Make a new directory called binaries and move the pdftotext to this directory. Make sure pdftotext still has 755 permission. Then set the following in the PhpDig config file:
PHP Code:
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
07-10-2004, 12:08 PM | #12 |
Green Mole
Join Date: Jul 2004
Location: Illnau, Switzerland, Europe
Posts: 9
|
yeeeees, it works!!!!!
in the path, i forgot the filname pdftotxt in the path ;( but now it works. thank a lot!!! I read so many explanations but with none I get it to work. now, I can send my mod to /dev/null I think, It would be nice, when the docu for phpdig would be more explaining. greets CaCO3 [a really happy man with a genial searchmaschin on his page ] |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Indexing PDF | dlaperle | Troubleshooting | 1 | 03-21-2007 07:00 PM |
help where I find External Binaries the pdf xls doc | gioducati | External Binaries | 0 | 08-11-2006 11:28 PM |
Suggestions needed for pdf tracking mod | chris33 | Mod Requests | 5 | 04-22-2005 01:20 PM |
PDF indexing | aryan | External Binaries | 11 | 11-27-2003 07:51 AM |