Man, I can not tell you how much I appreciate that you take the time to reply. I know you are busy. Thanks a ton.
I have been working on my problem all morning. My problem is I want to get all the travel articles from some really huge sites and I can not afford to spider the whole site and the exclude them with the "no way". I really need to try and make this work where it only traverses "down" if a path is given.
I have been traversing your code all morning. I am figuring that it will be easiest to set a variable (say $parent_path) when the first (level==0) entry is made in the tempspider table. Then in spider.php right before indexing on line 513 I could do this test if level>0.
if($level==0) {
$parent_path = $temp_path;
}
if(isset($parent_path)) {
$parent_path = preg_replace("/\//","\/",$parent_path);
$parent_match = "/^$parent_path/";
if(!preg_match($parent_match, $temp_path)) { // if parent path not in current path, do not index
$ok_for_index=0;
}
}
You know your code more intimately than I do. Can you give me some feedback on my proposed solution and if this is is a good way to do this.
Thanks.
T
Last edited by td234; 01-18-2005 at 11:28 AM.
|