|
04-16-2004, 08:19 PM | #1 |
Green Mole
Join Date: Jan 2004
Location: Penns Woods
Posts: 17
|
Default Depth
Hi,
What is the default depth of spidering when it's run as a cronjob from the shell? I didn't find any option to set it. Thanks in advance, Randy |
04-17-2004, 06:24 AM | #2 |
Green Mole
Join Date: Apr 2004
Posts: 3
|
- and also an addition to the question from tryangle - how can I limit the spider to only spider the pages that I list in a text file fed to the spider?
I've already tried changing the default level of spider recursion to be 0. But, still, the spider is very eager to index everything it finds! |
04-17-2004, 03:45 PM | #3 |
Purple Mole
Join Date: Dec 2003
Posts: 106
|
I think the default level running via cron is the same as browser, i.e. SPIDER_DEFAULT_LIMIT. Catchme - are you talking about URLs in the text file? It should just spider those. If you're talking about limiting the spidering of pages during an update, you could set the RESPIDER_LIMIT down.
__________________
Foundmyself.com artist community, art galleries |
04-17-2004, 04:56 PM | #4 |
Green Mole
Join Date: Jan 2004
Location: Penns Woods
Posts: 17
|
Thanks for your reply...
As far as I can tell, the default limit in the browser is set to zero, because that's the one that is (pre)selected on the select dropdown. But, I'm wondering if it's possible to set the depth when running it from the shell, or in crontab.
TIA for your help. |
04-17-2004, 04:59 PM | #5 |
Purple Mole
Join Date: Dec 2003
Posts: 106
|
If you open the includes/config.php file, you'll see the setting for the default depth. That's the one used when run through shell/crontab I believe.
__________________
Foundmyself.com artist community, art galleries |
04-17-2004, 11:10 PM | #6 |
Green Mole
Join Date: Apr 2004
Posts: 3
|
bloddjelly - at the moment, i have the respider limit and the default limit both set to be 0. but yet - when i set the spider off again, i find that it continues to visit the old pages which have already been spidered.
i think that it just passes over them quickly - but it's taking 1 or 2 seconds just for looking over these pages, that i don't want to index. the other setting that i've customized is the default reindex period - which i've set to be 500 days. basically, once something has been indexed, i don't want to return to this page again. |
04-19-2004, 04:54 AM | #7 |
Green Mole
Join Date: Jan 2004
Location: Penns Woods
Posts: 17
|
SPIDER_DEFAULT_LIMIT
Hi,
Thanks for your reply. I guess I missed it because I was looking for Search Depth. |
04-20-2004, 10:48 AM | #8 | |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Quote:
Hi. Try the following. In spider.php find: PHP Code:
PHP Code:
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Search Depth using cron | hpg4815 | How-to Forum | 1 | 10-02-2006 09:16 AM |
default template | ENTHALPIE | How-to Forum | 3 | 11-02-2005 07:21 AM |
default option should be to subscribe to threads you've created? | rwillmer | Feedback & News | 0 | 08-27-2005 03:28 AM |
Break the depth limit of 20? | WebSpider | How-to Forum | 9 | 02-09-2005 02:21 PM |
Changing the search depth | Brain | How-to Forum | 1 | 03-17-2004 03:34 AM |