|
03-23-2004, 02:18 PM | #1 |
Green Mole
Join Date: Mar 2004
Location: New York
Posts: 3
|
Indexing problem: PhpDig will not spider all of the site
Installed PhpDig version 1.8.0 successfully
------------------------------------- in the config.php define('SPIDER_MAX_LIMIT',900); define('SPIDER_DEFAULT_LIMIT',900); define('RESPIDER_LIMIT',900); define('LIMIT_DAYS',0); // if set to true, full path to external binary required define('PHPDIG_INDEX_MSWORD',true); define('PHPDIG_PARSE_MSWORD','/usr/ports/textproc/catdoc'); define('PHPDIG_OPTION_MSWORD','-s 8859-1'); define('PHPDIG_INDEX_PDF',true); define('PHPDIG_PARSE_PDF','/usr/ports/print/pstotext'); define('PHPDIG_OPTION_PDF','-cork'); ---------------------------------------------------------- Server information as follows: Platform: FreeBSD 4.8-RELEASE #0 Web Server version: Apache/1.3.29 (Unix) PHP 4.3.4 MySQL 4.0.13 PERL v5.8.0 built for i386-freebsd ----------------------------------------- only 1 tld ---------------------- I have tried to re-create the index more than one time and get very similar result everytime ---------------------------------------------------------- I have created a page that includes a link to all the pages/files that I want to index. and I can not get it to spider the whole site. ------------------------------------------------------------ It will not spider all the site. On some directories it will only do the first 13 while others it did the first 27 files. It will only do html files only even though it is supposed to do 'doc' and 'pdf' files. What am I doing wrong? The more urgent problem is that it does not spider all the site. In some directories there are more than 100 files and some are very large (over 1 meg), some of the PDF files contain only graphics and are as big as 40 megs. Please help and thank you in advance. mh |
03-24-2004, 07:16 PM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Is it only the DOC and PDF files that don't get indexed, or is it that the process stops after encountering a large file? Perhaps the issue is related to the one in this thread.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
03-24-2004, 10:43 PM | #3 |
Green Mole
Join Date: Mar 2004
Location: New York
Posts: 3
|
Thank you kindly for your reply.
Some stats: I have 161 files which are over 1 meg the biggest is 36 megs. Again these are pdf files with nothing but graphics and no text. In one directory which has 41 htm files the largest is 697kb. In this directory it froze (locked up) after completing 16 files only. I do not know if it gets stuck on large files or large files over time. I looked over the link you provided, do you recommend that I make those changes? I am not sure if I have shell access to the server but I do have ftp access. Thank you again. |
03-24-2004, 11:00 PM | #4 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. If you have access to your error logs, check to see if the "allowed memory size of X bytes exhausted" error is there. With these larger files, it seems that memory may get exhausted so perhaps try the code in that other thread.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
03-25-2004, 12:06 AM | #5 |
Green Mole
Join Date: Mar 2004
Location: New York
Posts: 3
|
Thank you once more
I have tried both suggestions. if (memory_get_usage() + 2000000 > 8000000) { return array('tempfile'=>0,'tempfilesize'=>0); } and $f_handler = fopen($tempfile1,'wb'); if (is_array($file_content)) { fwrite($f_handler,implode('',$file_content)); } fclose($f_handler); unset($file_content); $tempfilesize = filesize($tempfile1); ---------- This only locks the spider faster! I am at a loss. |
03-25-2004, 12:54 AM | #6 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Perhaps try lowering the numbers in the below code or making a list of smaller files to index.
PHP Code:
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Problem indexing site due to backslash | F.Keniki | Troubleshooting | 1 | 12-26-2006 08:34 AM |
successful indexing of every site but site where phpdig is served | phillystyle123 | Troubleshooting | 1 | 02-21-2005 10:06 PM |
Problem with site indexing.... | Lamer38 | Troubleshooting | 1 | 09-11-2004 07:36 AM |
Problem indexing site (uses mod_rewrite) | ragaller | Troubleshooting | 8 | 03-16-2004 11:22 PM |
Strange indexing problem on my site | drbill | Troubleshooting | 9 | 01-01-2004 02:29 PM |