|
02-07-2005, 02:10 AM | #1 |
Green Mole
Join Date: Feb 2005
Posts: 16
|
Break the depth limit of 20?
Is the Depth limit of 20 a script limitation? a resource limitation? some sort of loop avoidance?
I ask because I tried to spider a directory where each new page of results is considered a new level, and there are categories with more than 20 pages. Can we break this limit somehow? Thanks! |
02-07-2005, 03:32 AM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Just change it in the config file:
Code:
define('SPIDER_MAX_LIMIT',20); // max (re)index search depth - used for shell and admin panel dropdown define('RESPIDER_LIMIT',5); // max update search depth - only used for browser, not used for shell define('LINKS_MAX_LIMIT',20); // max (re)index links per - used for shell and admin panel dropdown define('RELINKS_LIMIT',5); // max update links per - only used for browser, not used for shell
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-07-2005, 07:47 AM | #3 |
Green Mole
Join Date: Feb 2005
Posts: 16
|
Thanks-a-bunch Charter!
Off-side, are you the only developer behind PHPDigger? Do u take donations? |
02-07-2005, 01:16 PM | #4 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Antoine was the previous developer, releasing the initial version through v.1.6.2, and I have since been the current developer. There have also been contributions posted in the forums and/or listed in the CREDITS, CHANGELOG, and README files. Some history about the change in developers can be found here.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-08-2005, 03:19 PM | #5 |
Green Mole
Join Date: Feb 2005
Posts: 16
|
Thanks.
I changed the depth limit to 60 and now i try to rerun the spider over the same domain so it will add the rest of links not spidered beyond the initial 20 hops, however it won't spider any link but the very first page and then stop. Ideas? |
02-08-2005, 07:06 PM | #6 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Check the values in the update sites table via the admin panel.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-09-2005, 12:11 AM | #7 |
Green Mole
Join Date: Feb 2005
Posts: 16
|
They match my proposals: depth 60 and links 0 (aka all).
|
02-09-2005, 12:30 PM | #8 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Some thoughts...
- Try using the textbox, 60, 0, no. - View the robots.txt file for changes. - Look for meta revisit-after/robots tags. - Enter the site at a different location.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-09-2005, 03:18 PM | #9 |
Green Mole
Join Date: Feb 2005
Posts: 16
|
- Used both text and combo box
- No robots.txt present - No revisits on the code - Thats the only thing i should try now. However, does it make sense to index both www.domain.com and domain.com when they're 99% of the times the same thing? shouldn't this be implemented (even as a switch?) on the code of the digger? |
02-09-2005, 03:21 PM | #10 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Set PHPDIG_IN_DOMAIN to true in the config file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Plus character(+) converted to (%20) in urls | raymerica | Troubleshooting | 2 | 05-31-2006 01:19 PM |
Spaces (%20) in URLs | FaberFedor | How-to Forum | 2 | 02-08-2005 11:02 AM |
Problem spidering sites at in .txt over 20 address | joshuag200 | Troubleshooting | 3 | 01-30-2004 09:13 PM |
Add search depth limit to the sites table | peter | Mod Requests | 0 | 01-03-2004 10:14 PM |