PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Mod Submissions

Reply
 
Thread Tools
Old 03-23-2007, 08:14 AM   #1
marco
Green Mole
 
Join Date: Mar 2007
Posts: 2
Crawler speed improvement (although affects limit)

I had the problem phpdigExplore() returns to many duplicate links. This caused the spider to check 100s of duplicate URLs, which caused a slowdown, and the 1000 pages limit was hit quite fast.

Finally I added the following code at the end of phpDigExplore():
PHP Code:
if(!$_SESSION["links"]) $_SESSION["links"]=array();
$resultlinks = array();
foreach(
$links as $link){
    if(!
array_search($link$_SESSION["links"])){
        
$_SESSION["links"][]=$link;
        
$resultlinks[]=$link;
    }
}
return 
$resultlinks
I don't know whether this modification is useful or harms other components. But for the moment, it works.
marco is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Multiple crawler code error noel Troubleshooting 3 11-06-2005 12:10 PM
Split the search engine and the web crawler nobrin How-to Forum 1 08-15-2005 11:53 AM
Submit crawler Mindrot Mod Requests 5 08-26-2004 08:36 PM
Anything to speed up spidering jinkas Mod Requests 0 08-25-2004 03:07 PM
Italian language improvement cybercox Mod Submissions 0 01-11-2004 05:41 AM


All times are GMT -8. The time now is 04:28 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.