|
12-18-2004, 02:27 PM | #1 |
Orange Mole
Join Date: Jan 2004
Posts: 30
|
Wildcard for banned external links?
I was looking over this part in the config and wondered if there is a way to use a wildcard such as banner* so it works for banners also or other plurals.
Code:
// regular expression to ban useless external links in index define('BANNED','^ad\.|banner|banners|doubleclick|links|forum|affiliates'); The ^ represents what? The \. represents what? |
12-18-2004, 07:53 PM | #2 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
^ means the regular expression starts with the characters following it.
\. is escaping the period. Putting this regular expression back together and interpreting it in English, it means: An expression that begins with the characters "ad." (without the quotes), and is followed by one of the following words: banner banners doubleclick links forum affiliates Expressed another way, it's looking for one of the following strings of characters: ad.banner ad.banners ad.doubleclick ad.links ad.forum ad.affiliates Hope this helps. |
12-19-2004, 03:19 AM | #3 |
Orange Mole
Join Date: Jan 2004
Posts: 30
|
Thanks vinyl-junkie,
You explained that very well. I'm am wanting to ban links like "links" as in a links page or links/index.html or forum directories. Will i have to make a new line and a brand new define and then simply try to imitate what BANNED is doing? Line 1264 in robot_functions.php is the only reference I found to BANNED Code:
if ($regs[5] && $regs[5] != $localdomain && !eregi(BANNED,$regs[5]) && ereg('[a-z]+',$regs[5])) { then would Line 1264 in robot_functions.php be written this way? Code:
if ($regs[5] && $regs[5] != $localdomain && !eregi(BANNED,$regs[5]) && !eregi(BANNED2,$regs[5]) && ereg('[a-z]+',$regs[5])) { A little info: 800 sites @ level 1 depth has me at a 20 mb database size. Time to downsize and then decide to get more database space if needed. Thanks again foryour reply Last edited by Slider; 12-19-2004 at 03:29 AM. |
12-19-2004, 08:40 AM | #4 | |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Quote:
|
|
12-19-2004, 08:50 AM | #5 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
I think I gave you some incorrect information with regard to just what BANNED means. I've been struggling to learn regular expressions. What that is saying is that BANNED is looking for one of the following strings:
"ad." (without the quotes) at the beginning of the string, or "banner" (without the quotes) anywhere in the string, or "banners" (without the quotes) anywhere in the string, or "doubleclick" (without the quotes) anywhere in the string, or "links" (without the quotes) anywhere in the string, or "forum" (without the quotes) anywhere in the string, or "affiliates" (without the quotes) anywhere in the string Just wanted to set that straight. |
12-19-2004, 09:07 AM | #6 |
Orange Mole
Join Date: Jan 2004
Posts: 30
|
That was the way I was seeing it reading. Thank you so much for clarifying it for me. I did go to php.net and see example of what you are now saying it reads as.
I will start crawling all over again and see if it ignores links directories now. I'm trying very hard to reduce the size of the MysQL Database and getting rid of non-informative links. Thanks again p.s. I don't mind getting a response if even with only some correct information. A response to a question at all is much appreciated. Thanks for being here |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Banned Domains | JLutterklas | How-to Forum | 0 | 09-05-2006 11:38 AM |
partial/wildcard word searching | rwillmer | How-to Forum | 4 | 08-27-2005 10:36 AM |
spidering external links | websearch | How-to Forum | 1 | 01-11-2005 09:39 AM |
Spider External links to a depth of 1 (1.8.3) | kenazo | How-to Forum | 0 | 10-20-2004 07:28 AM |
Searching external domains/links | kenazo | How-to Forum | 3 | 03-14-2004 03:55 PM |