View Full Version : Exclude links with certain url variabls
jclementson
01-26-2004, 03:35 AM
Hi there,
Every page on my website has a link to a printer-friendly version of the same page, done with [thispage.php?print=y]
I need to exclude these links from the spidering process, but without excluding other url variables such as [news.php?story=11]
Basically I need a way to tell the spidering process not to follow links containing a specific string (in this case '?print=y'). I can't find this feature already there, so can someone guide me to the right fuction and how to modify it?
Thanks
Every page on my website has a link to a printer-friendly version of the same page, done with [thispage.php?print=y]
Just started figuring out this case also. After line 412 in "search_function.php" add:
$content['file'] = preg_replace("print=y'si","", $content['file']);
(line before: $url = eregi_replace("([a-z0-9])[/]+... )
This strips "print=y" away. Bad thing is that you get double when searhing searching (those without "print" and those with "print" -> only url is filtered). Lets keep up looking...
jclementson
01-26-2004, 05:20 AM
Thanks, that's a useful start.
I'm looking at function phpdigExplore in robot_functions.php, but I can't figure it out yet.
jclementson
01-26-2004, 05:54 AM
Got it!
In robot_functions.php, I've added a test at the end of function phpdigDetectDir.
This is how I've done it for the test I need, showing lines 537 onwards. My addition is at line 543:
//test the exclude with robots.txt
if (phpdigReadRobots($exclude,$link['path'].$link['file']) == 1
|| isset($exclude['@ALL@'])
) {
$link['ok'] = 0;
}
//exclude if specific variable set
if (strpos($link['file'],'print=y')) {
$link['ok'] = 0;
}
//print "<pre>"; print_r($link); print "</pre>\n";
return $link;
I got it too... somehow
Edited "search_function.php" a bit. It is a bit messy, so i wont post it here. Anyway it works pretty well, not perfect. This feature would be a nice add on future versions.
I have different language versions, so I dont want to rip off search results permanently.
JoNtE
02-25-2004, 01:19 AM
Found this in the config.php file:
// regular expression to ban useless external links in index
define('BANNED','^ad\.|banner|doubleclick');
change it to:
define('BANNED','^ad\.|banner|doubleclick|print=y');
I guess this could be used to exclude the urls with strings matching the reg-exp
Have the same problem... but not tested this possible solution yet... will be back with the result.
// JoNtE
vBulletin® v3.7.3, Copyright ©2000-2024, Jelsoft Enterprises Ltd.