PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 03-23-2004, 02:18 PM   #1
mih
Green Mole
 
Join Date: Mar 2004
Location: New York
Posts: 3
Indexing problem: PhpDig will not spider all of the site

Installed PhpDig version 1.8.0 successfully

-------------------------------------
in the config.php

define('SPIDER_MAX_LIMIT',900);
define('SPIDER_DEFAULT_LIMIT',900);
define('RESPIDER_LIMIT',900);

define('LIMIT_DAYS',0);


// if set to true, full path to external binary required
define('PHPDIG_INDEX_MSWORD',true);
define('PHPDIG_PARSE_MSWORD','/usr/ports/textproc/catdoc');
define('PHPDIG_OPTION_MSWORD','-s 8859-1');

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/usr/ports/print/pstotext');
define('PHPDIG_OPTION_PDF','-cork');
----------------------------------------------------------
Server information as follows:

Platform: FreeBSD 4.8-RELEASE #0
Web Server version: Apache/1.3.29 (Unix)
PHP 4.3.4
MySQL 4.0.13
PERL v5.8.0 built for i386-freebsd
-----------------------------------------
only 1 tld
----------------------
I have tried to re-create the index more than one time and get very similar result everytime
----------------------------------------------------------
I have created a page that includes a link to all the pages/files that I want to index. and I can not get it to spider the whole

site.
------------------------------------------------------------
It will not spider all the site. On some directories it will only do the first 13 while others it did the first 27 files. It will only do

html files only even though it is supposed to do 'doc' and 'pdf' files.

What am I doing wrong?

The more urgent problem is that it does not spider all the site. In some directories there are more than 100 files and some

are very large (over 1 meg), some of the PDF files contain only graphics and are as big as 40 megs.

Please help and thank you in advance.
mh
mih is offline   Reply With Quote
Old 03-24-2004, 07:16 PM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Is it only the DOC and PDF files that don't get indexed, or is it that the process stops after encountering a large file? Perhaps the issue is related to the one in this thread.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-24-2004, 10:43 PM   #3
mih
Green Mole
 
Join Date: Mar 2004
Location: New York
Posts: 3
Thank you kindly for your reply.

Some stats:
I have 161 files which are over 1 meg the biggest is 36 megs.

Again these are pdf files with nothing but graphics and no text.


In one directory which has 41 htm files the largest is 697kb. In this directory it froze (locked up) after completing 16 files only.


I do not know if it gets stuck on large files or large files over time. I looked over the link you provided, do you recommend that I make those changes?

I am not sure if I have shell access to the server but I do have ftp access.

Thank you again.
mih is offline   Reply With Quote
Old 03-24-2004, 11:00 PM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. If you have access to your error logs, check to see if the "allowed memory size of X bytes exhausted" error is there. With these larger files, it seems that memory may get exhausted so perhaps try the code in that other thread.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-25-2004, 12:06 AM   #5
mih
Green Mole
 
Join Date: Mar 2004
Location: New York
Posts: 3
Thank you once more

I have tried both suggestions.

if (memory_get_usage() + 2000000 > 8000000) {
return array('tempfile'=>0,'tempfilesize'=>0);
}


and


$f_handler = fopen($tempfile1,'wb');
if (is_array($file_content)) {
fwrite($f_handler,implode('',$file_content));
}
fclose($f_handler);
unset($file_content);
$tempfilesize = filesize($tempfile1);


----------

This only locks the spider faster!

I am at a loss.
mih is offline   Reply With Quote
Old 03-25-2004, 12:54 AM   #6
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Perhaps try lowering the numbers in the below code or making a list of smaller files to index.
PHP Code:
if (memory_get_usage() + 2000000 8000000) {
    return array(
'tempfile'=>0,'tempfilesize'=>0);

Also, maybe try changing the 900's to a much lower number. There is a search depth example in this post.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem indexing site due to backslash F.Keniki Troubleshooting 1 12-26-2006 08:34 AM
successful indexing of every site but site where phpdig is served phillystyle123 Troubleshooting 1 02-21-2005 10:06 PM
Problem with site indexing.... Lamer38 Troubleshooting 1 09-11-2004 07:36 AM
Problem indexing site (uses mod_rewrite) ragaller Troubleshooting 8 03-16-2004 11:22 PM
Strange indexing problem on my site drbill Troubleshooting 9 01-01-2004 02:29 PM


All times are GMT -8. The time now is 12:47 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.