Thread: PDF indexing
View Single Post
Old 12-07-2003, 10:18 AM   #3
lelandv
Green Mole
 
Join Date: Dec 2003
Posts: 11
Quote:
Originally posted by Charter
Hi. If the output goes to STDOUT, then set define('PHPDIG_PDF_EXTENSION','');

The extension .txt in define('PHPDIG_PDF_EXTENSION','.txt'); is only needed if the output goes to file with a .txt extension.
Hiya.. I've done this, but the PDF file is still not indexed.. just the filename

Am I missing something here?

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/usr/local/bin/pdf2txt');
define('PHPDIG_OPTION_PDF','');
define('PHPDIG_PDF_EXTENSION','');


the actual PDF file is linked off of another page, and looking at the server logs I do see the crawler retrieving the pdf document in the first place... just that it's still not indexed at all.


taranta.discpro.org - - [07/Dec/2003:19:14:40 +0000] "HEAD /pdftest/InstrumentPilot39.pdf HTTP/1.1" 200 0 "-" "PhpDig/1.6.2 (PHP; MySql)"
taranta.discpro.org - - [07/Dec/2003:19:14:40 +0000] "GET /pdftest/InstrumentPilot39.pdf HTTP/1.0" 200 1262188 "-" "PHP/4.2.2"

Leland
lelandv is offline   Reply With Quote