PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > How-to Forum

Reply
 
Thread Tools
Old 04-16-2004, 09:19 PM   #1
tryangle
Green Mole
 
Join Date: Jan 2004
Location: Penns Woods
Posts: 17
Question Default Depth

Hi,

What is the default depth of spidering when it's run as a cronjob from the shell? I didn't find any option to set it.

Thanks in advance,
Randy
tryangle is offline   Reply With Quote
Old 04-17-2004, 07:24 AM   #2
catchme
Green Mole
 
Join Date: Apr 2004
Posts: 3
- and also an addition to the question from tryangle - how can I limit the spider to only spider the pages that I list in a text file fed to the spider?

I've already tried changing the default level of spider recursion to be 0. But, still, the spider is very eager to index everything it finds!
catchme is offline   Reply With Quote
Old 04-17-2004, 04:45 PM   #3
bloodjelly
Purple Mole
 
Join Date: Dec 2003
Posts: 106
I think the default level running via cron is the same as browser, i.e. SPIDER_DEFAULT_LIMIT. Catchme - are you talking about URLs in the text file? It should just spider those. If you're talking about limiting the spidering of pages during an update, you could set the RESPIDER_LIMIT down.
bloodjelly is offline   Reply With Quote
Old 04-17-2004, 05:56 PM   #4
tryangle
Green Mole
 
Join Date: Jan 2004
Location: Penns Woods
Posts: 17
Thanks for your reply...

As far as I can tell, the default limit in the browser is set to zero, because that's the one that is (pre)selected on the select dropdown. But, I'm wondering if it's possible to set the depth when running it from the shell, or in crontab.

TIA for your help.
tryangle is offline   Reply With Quote
Old 04-17-2004, 05:59 PM   #5
bloodjelly
Purple Mole
 
Join Date: Dec 2003
Posts: 106
If you open the includes/config.php file, you'll see the setting for the default depth. That's the one used when run through shell/crontab I believe.
bloodjelly is offline   Reply With Quote
Old 04-18-2004, 12:10 AM   #6
catchme
Green Mole
 
Join Date: Apr 2004
Posts: 3
bloddjelly - at the moment, i have the respider limit and the default limit both set to be 0. but yet - when i set the spider off again, i find that it continues to visit the old pages which have already been spidered.

i think that it just passes over them quickly - but it's taking 1 or 2 seconds just for looking over these pages, that i don't want to index.

the other setting that i've customized is the default reindex period - which i've set to be 500 days. basically, once something has been indexed, i don't want to return to this page again.
catchme is offline   Reply With Quote
Old 04-19-2004, 05:54 AM   #7
tryangle
Green Mole
 
Join Date: Jan 2004
Location: Penns Woods
Posts: 17
SPIDER_DEFAULT_LIMIT

Hi,

Thanks for your reply. I guess I missed it because I was looking for Search Depth.
tryangle is offline   Reply With Quote
Old 04-20-2004, 11:48 AM   #8
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539

Quote:
Originally posted by catchme
bloddjelly - at the moment, i have the respider limit and the default limit both set to be 0. but yet - when i set the spider off again, i find that it continues to visit the old pages which have already been spidered.

i think that it just passes over them quickly - but it's taking 1 or 2 seconds just for looking over these pages, that i don't want to index.

the other setting that i've customized is the default reindex period - which i've set to be 500 days. basically, once something has been indexed, i don't want to return to this page again.

Hi. Try the following. In spider.php find:
PHP Code:
$andmore_tempspider 'AND upddate < now()'
and replace with:
PHP Code:
$andmore_tempspider 'AND upddate < DATE_SUB(now(), INTERVAL 500 DAY)'
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Search Depth using cron hpg4815 How-to Forum 1 10-02-2006 10:16 AM
default template ENTHALPIE How-to Forum 3 11-02-2005 08:21 AM
default option should be to subscribe to threads you've created? rwillmer Feedback & News 0 08-27-2005 04:28 AM
Break the depth limit of 20? WebSpider How-to Forum 9 02-09-2005 03:21 PM
Changing the search depth Brain How-to Forum 1 03-17-2004 04:34 AM


All times are GMT -8. The time now is 07:38 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.