PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   Incomplete spidering (http://www.phpdig.net/forum/showthread.php?t=2412)

bocephalus 03-28-2006 04:12 PM

Incomplete spidering
 
I have not been able to find any other posts that answer my question.

The spidering stops with no error message at times ranging from 25-ish seconds to 4-ish minutes. Each time I spider and usually I click 'stop spider' and the number of pages in the database has not gone up.

I am indexing via browser - firefox and have
Code:

network.http.keep-alive.timeout = 600
, and I have a php.ini file in the phpdig main directory with the following settings:
Code:

max_execution_time = 600
max_input_time = 600

What happens is that firefox stops loading the page and there is no error message or anything. At this point if I check the tempspider table it has a bunch of data in it. Then I go to the top of the spidering page and click "stop spider". it stops and I go back to the admin interface and the number of pages has not gone up even though the number of pages spidered was in the 200's and the page selected had a bunch of links to pages that still are not indexed. I don't know that any of those 200 were new pages, but there are definitely new pages linked.

i entered 1 link
Code:

search depth: 1
links per: 0

I have also tried with search depth of 20 with the same results.

some other settings:
Code:

define('SPIDER_MAX_LIMIT',20);          //max recurse levels in spider
define('RESPIDER_LIMIT',5);            //recurse respider limit for update
define('LINKS_MAX_LIMIT',20);          //max links per each level
define('RELINKS_LIMIT',5);              //recurse links limit for an update

//for limit to directory, URL format must either have file at end or ending slash at end
//e.g., http://www.domain.com/dirs/ (WITH ending slash) or http://www.domain.com/dirs/dirs/index.php
define('LIMIT_TO_DIRECTORY',false);    //limit index to given (sub)directory, no sub dirs of dirs are indexed

define('LIMIT_DAYS',0);                //default days before reindex a page
define('SMALL_WORDS_SIZE',2);          //words to not index - must be 2 or more

If something is killing the spider, then what would it be and how would I avoid that?


All times are GMT -8. The time now is 03:59 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.