Thread: pstotext issue
View Single Post
Old 04-28-2004, 09:56 AM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. When you try the following query, change word to some word that could only be in the PDF file:
Code:
select keyword from keywords where keyword like '%word%';
The file in the text_content directory that contains the following:

Index of /pdf Name Last modified Size Description Parent Directory
28-Apr-2004 16:35 - 01123SOC2004013.PDF 28-Apr-2004 18:30 69k pdf.html
28-Apr-2004 18:18 1k test.doc 28-Apr-2004 17:24 19k zyz.xls 28-Apr-2004
17:24 14k Apache1.3.29 - ProXad [Apr 1 2004 16:04:22] Server at
monsiteweb.fr Port 80 Index of /pdf Index of /pdf Index of /pdf

That seems like a directory listing rather than for the actual PDF file. The $result array contains the following:

Result contains: Array ( [0] => Hébergement [1] => Facture [2] => partners -- 5 Sq de tuile_ 78000 Versailles -- Tél. / Fax : 0666666666 -- Email : contact@partners.com [3] => SARL au capital de 3000# -- Siret545454445RCS Versailles -- APE 222Z -- Web : www.partners.com [4] => [5] => FACTURE [6] => partners CLIENT [7] => 5 Sq de tuile Adzd MAdzNdzAS [8] => 78000 Versailles [9] => Tél./fax. : 01 3226222626 [10] => Prestation : Hébergement [11] => Facture du: 01/04/2004 au 31/06/2004 [12] => N° de Facture: 12122/66 [13] => Article Objet Quantité [14] => / [15] => Slots [16] => Prix [17] => unitaire / [18] => Trimestre [19] => Montant TVA [20] => Hébergement Serveur [21] => Total HT 122.36 [22] => Total TVA 23.61 [23] => Total TTC 122.00 [24] => A payer 122.00 EUROS [25] => Mode de paiement : A réception de facture [26] => )

And with $retval being zero, the following code should make a temp file containing the stuff from the $result array:
PHP Code:
if (!$retval) {
     
// the replacement if š is for unbreaking spaces
     // returned by catdoc parsing msword files
     // and '0xAD' "tiret quadratin" returned by pstotext
     // in iso-8859-1
     // Adjust with your encoding and/or your tools
     
if ((is_array($result)) && (count($result) > 0)) {
        
$f_handler fopen($tempfile1,'wb');
        
fwrite($f_handler,str_replace('š',' ',str_replace(chr(0xad),'-',implode(' ',$result))));
        
fclose($f_handler);
     }
}
else {
     return array(
'tempfile'=>0,'tempfilesize'=>0);

Also, what do you get with the following query:
Code:
select file,first_words from spider where file like '%01123SOC2004013%';
And are the admin/temp and text_content directories set to 777 permissions?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote