|
04-09-2004, 12:52 AM | #1 |
Green Mole
Join Date: Nov 2003
Posts: 3
|
Junk in keywords table - Indexing PDF
I have 1.8.0 installed on Redhat Linux.
When I index my pdf files I get lots of junk in the keywords table. It finds the file ok, but I don't get anything worth any value. Below is a snippit of some of the data. I have used the sample .PDF from this site with no luck. I have read through most of the forums with no luck. I am using pdftotext to create my plaintext file. It dosen't support STDOUT, but does create a txt file that I can open and see that it did parse the file correctly. I have also included a little snippit of my config.php. It almost looks like it is getting the encoding wrong, does anyone have any ideas? Thanks much, Bege +----------------+ | keyword | +----------------+ | 6aeyqo,n | | E#b5 | | kde | | 3Iqha:cp | +----------------+ define('PHPDIG_INDEX_PDF',true); define('PHPDIG_PARSE_PDF','/usr/local/bin/pdftotext'); define('PHPDIG_OPTION_PDF',''); define('PHPDIG_PDF_EXTENSION','txt'); |
04-09-2004, 01:13 AM | #2 |
Green Mole
Join Date: Nov 2003
Posts: 3
|
More Research
I have looked at the temp files in the text_content dir, and all of the junk that i am getting in the database is in this file. How is the file getting created? When i run the pdftotext in bash everything works just fine, what is the difference?
|
04-09-2004, 08:15 AM | #3 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Don't forget the period...
PHP Code:
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Not indexing pages, keywords, etc.. | patrick@online- | Troubleshooting | 5 | 04-15-2006 03:10 AM |
keywords missing after indexing | 123av | Troubleshooting | 2 | 10-21-2004 09:28 AM |
excluding keywords from indexing | Fking | How-to Forum | 1 | 10-05-2004 06:43 PM |
Indexing finds all pages, but doesn't index all keywords | arakune | Troubleshooting | 2 | 08-25-2004 06:18 PM |
Reduce duplicates in keywords table through more intelligent indexing | jerrywin5 | Mod Requests | 1 | 04-20-2004 09:06 AM |