|
05-24-2004, 06:59 AM | #1 |
Green Mole
Join Date: May 2004
Posts: 10
|
Keeping the spider in the search directory and its subdirs
Hi,
I typically want to spider just a subdirectory and its subdirs, so that I don't want the spider to go up into the parent directory of the URL that I specify. e.g. I want to index all of www.myplace.com/searchme The starting point is www.myplace.com/searchme/index.html I want all the other stuff in /searchme to be indexed, but I don't want www.myplace.come/donttouch, EVEN THOUGH there is a link from /searchme/index.html to /donttouch/index.html. IS there a way to tell PHPdig not to 'go up' in the directory hierarchy ? Thanks a lot ! |
05-24-2004, 07:17 AM | #2 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Welcome to the forum, ciaran@clissman.
In the includes/config.php file, find the following statement: PHP Code:
PHP Code:
|
05-24-2004, 07:42 AM | #3 |
Green Mole
Join Date: May 2004
Posts: 10
|
Thanks, Pat,
but it's not doing what I expect. I ask it to index http://www.waterfordcity.ie/library/ballybricken.htm with a search depth of 3 and with define('PHPDIG_IN_DOMAIN',true); and the first few results are SITE : http://www.waterfordcity.ie/ Exclude paths : - @NONE@ 1:http://www.waterfordcity.ie/library/ballybricken.htm (time : 00:00:07) + + + + + + level 1... 2:http://www.waterfordcity.ie/library/ (time : 00:00:20) 3:http://www.waterfordcity.ie/gallery.htm (time : 00:00:28) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 4:http://www.waterfordcity.ie/environment/index.htm (time : 00:00:37) + 5:http://www.waterfordcity.ie/planning/index.htm (time : 00:00:46) + + While really what I want is that everything in http://www.waterfordcity.ie/library is indexed and nothing else Any thoughts ? Thanks again ! Ciaran |
05-24-2004, 08:22 AM | #4 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Sorry, I misunderstood what you were asking. You would like for phpdig to stay in a specific directory when spidering, right? In that case, this thread has what you need.
|
05-24-2004, 08:28 AM | #5 |
Green Mole
Join Date: May 2004
Posts: 10
|
Hmm, we're not there yet.
The sites I crawling aren't mine, so I can't put robot.txt files into them. Is there not a function someplace that says ' if the directory of the page you are thinking about indexing is the parent directory of the page you were started at, leave it alone (or not, depending on the config variable)' ? thanks again Ciaran |
05-24-2004, 08:40 AM | #6 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
To my knowledge, the method that I gave you is the only way you can have phpdig stay within the directory you specify. I'm not sure what new features may end up in version 1.8.1, but I know Charter is working on that right now. Perhaps he'll consider adding this as a feature. I know it's a subject that comes up fairly often around here.
|
05-24-2004, 08:44 AM | #7 |
Green Mole
Join Date: May 2004
Posts: 10
|
Good enough. Thanks for the tips !
Ciaran [sunny Dublin, Ireland, quarter to five in the afternoon] |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Search in specific directory | laurentxav | How-to Forum | 7 | 01-04-2005 09:34 AM |
Man convicted of keeping accidentally mailed wages | Charter | The Mole Hole | 1 | 09-27-2004 02:35 PM |
Specific Directory Search | kh44na | How-to Forum | 3 | 04-01-2004 05:52 AM |
Search in specific directory | tams | Troubleshooting | 1 | 03-15-2004 03:08 AM |
Search in specific directory ONLY? | mrfuches | How-to Forum | 6 | 01-23-2004 12:06 AM |