PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > How-to Forum

Reply
 
Thread Tools
Old 03-01-2005, 11:03 PM   #1
new2dev
Green Mole
 
Join Date: Feb 2005
Posts: 4
Newbie on Domains: Yes or No Answer Please :)

Hi!

Would it be possible to say, force Dig to crawl many domains, but only start in a sublevel of the domain, and never "crawl" out of that domain to offsite links?

For example

fetch URLS to pages in:

domain1.com/subdir
domain1.com/subdir/subdir...

domain2.com/subdir/subdir/...

but never return results like:

domain1.com
domain1.com/doc.html
domain1.com/unspecified-dir/
domain1.com/unspecified-dir/doc.html


I hope this makes sense.
Hoping to use dig to spyder specific content on many domains
but not crawl around too much or jump onto "offsite" undefined
domains.

A yes or No would be appreciated!
new2dev is offline   Reply With Quote
Old 03-01-2005, 11:24 PM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Assuming you are using PhpDig v.1.8.7, apply the code change in this post, set LIMIT_TO_DIRECTORY to true and set PHPDIG_IN_DOMAIN to false, both in the config file, and then index http://www.domain.com/subdir/ (with ending slash) from the PhpDig admin panel textbox using whatever "Search depth" and "Links per" you prefer. Note that values present in "Update sites" are used by default, so just choose "no" or edit "Update sites" if you wish to reset the values.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Newbie bheyse Troubleshooting 4 01-12-2006 10:36 AM
Bad answer for PDF Jean-Philippe External Binaries 0 12-19-2005 02:47 AM
Newbie fatpublisher The Mole Hole 1 10-06-2004 05:56 PM
Help for a newbie Arnaud Troubleshooting 2 03-13-2004 06:12 AM


All times are GMT -8. The time now is 05:33 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.