PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   How-to Forum (http://www.phpdig.net/forum/forumdisplay.php?f=33)
-   -   Exclude list? (http://www.phpdig.net/forum/showthread.php?t=637)

antun 03-10-2004 08:42 AM

Exclude list?
 
First of all, PhpDig looks like an awesome product. I've been looking for a new search engine for ages, and I think I've found it!!

One question:

In the docs for phpdig 1.8.0, it says:

Quote:

At least, the robot compare the URI with the exclude list.
This is the only mention of an exclude list I could find in any of the files in the distribution, and I searched these forums, but the only thing I turned up was the:
PHP Code:

define('BANNED','^ad\.|banner|doubleclick'); 

... variable in config.php. Is the BANNED variable the recommended way to exclude paths? What I'd like to do is exclude, say:

/developers/community/forums/...

-Antun

Charter 03-10-2004 09:29 AM

Hi antun, and welcome to PhpDig.net!

The BANNED constant is meant to prevent the following of certain links in pages that get crawled. To prevent certain directories from being crawled altogether, set a robots.txt file in your web root. If a directory has already been crawled and you want to exclude it, just click the red circle noway symbol from the admin panel.

antun 03-10-2004 10:11 AM

Thanks, but if I use a robots.txt file to exclude certain directories, won't that prevent those dirs from being indexed by public search engines too (e.g. Google?).

I'm only trying to fine-tune our search - for example, I'd like to exclude our forums from all searches, and I'd like to remove our Developers area from all non-tech releated searches.

Should I be excluding different directories and running separate indexes, or should I be running one large index and (if possible?) excluding parts of the site at search-time?

-Antun

Charter 03-10-2004 10:32 AM

Hi. A robots.txt file with the following should exclude the directories from PhpDig prior to index:
Code:

User-agent: PhpDig
Disallow: /developers/
Disallow: /developers/community/forums/
Disallow: /lps/
Disallow: /lps-2.0/docs/lzx-developers-guide/

If a directory that you don't want indexed has already been indexed, just click the red circle to delete and exclude it, making sure that the tempspider table is empty prior to reindex.

antun 03-10-2004 11:24 AM

Got it! That will work for the "/developers/community/forums/", which I never want indexed.

However, in my case, I'd like to have separate configurations:

- The entire website (excluding /developers/).
- All of /developers/, but nothing in the rest of the site.
- Just /lps-2.0/docs/lzx-reference/, but nothing else.

I presume the best way would be to have each one as a separate website, right?. You see I want to give people an option as to what to search (using a pull-down) most likely. (You can see what I mean here: http://www.laszlosystems.com/developers/).

-Antun

Charter 03-10-2004 11:38 AM

Hi. Perhaps this thread might help.


All times are GMT -8. The time now is 03:45 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.