|
08-28-2006, 03:24 AM | #1 |
Green Mole
Join Date: Apr 2004
Posts: 7
|
accent in links
PHPDig is not indexing links with accents. I'm using apache in windows XP(XAMPP from apachefriends) and I've setted PHPDig 1.8.8 in spanish (es).
Example link: http://localhost/Informatica/Documen...nto/index.html Pay attention to the word código and the accent. Microsoft explorer gets this page. PHPDig no. Answer is 403 Forbidden. I have seen that Microsoft explorer changes ó (oacute) for %C3%B3, instead of PHP fputs, that send %B3 only. I've tried some code just in http request, like $pathant=$path; $separados=explode("?",$path,2); $separados[0]=str_replace("%3A",":",str_replace("%2F","/",urlencode(utf8_encode($separados[0])))); $path=implode("?",$separados); //complete get $request = "HEAD $path $http_scheme/1.1".END_OF_LINE_MARKER ."Host: $host$sport".END_OF_LINE_MARKER .$cookiesSendString .$auth_string ."Accept: */*".END_OF_LINE_MARKER ."Accept-Charset: ".PHPDIG_ENCODING.END_OF_LINE_MARKER ."Accept-Encoding: identity".END_OF_LINE_MARKER ."Connection: close".END_OF_LINE_MARKER ."User-Agent: PhpDig/".PHPDIG_VERSION." (+http://www.phpdig.net/robot.php)".END_OF_LINE_MARKER.END_OF_LINE_MARKER; $path=$pathant; and I have not the error 403 Forbidden, but then spider stops with "No links in temporary table" Last edited by pepevilluela; 08-28-2006 at 03:28 AM. |
08-29-2006, 02:56 AM | #2 |
Green Mole
Join Date: Apr 2004
Posts: 7
|
accents solved
I have solved my problem with accents in links. I'm spanish and I use accent in links and ñ and Ñ. I think I've fixed the problem replacing the line 218 in robot_functions.php:
$eval = str_replace(" ","%20",$eval); with $separados=explode("?",$eval,2); $separados[0]=str_replace("%25","%",str_replace("%3A",":",str_replace("%2F","/",rawurlencode(utf8_encode($separados[0]))))); $eval=implode("?",$separados); and changing in config.php the variable allowed_link_chars: $allowed_link_chars = "[:%/?=&;\\,._a-zA-Z0-9áÁéÉ*ÍóÓúÚüÜñÑ|+ ()~-]*"; // includes space and () - not good with javascript, y acentos y guiones and forget the previous post. This can solve any special character, just including it in $allowed_link_chars. I hope this help other spanish and not english people. Last edited by pepevilluela; 08-29-2006 at 02:59 AM. |
08-29-2006, 04:02 AM | #3 |
Green Mole
Join Date: Apr 2004
Posts: 7
|
Results page
Don't forget change search_function.php in line 518 or links will be wrong:
$timer->stop('Extracts'); $separados=explode("?",$url,2); 2F","/",rawurlencode(utf8_encode($separados[0]))))); $separados[0]=utf8_decode($separados[0]); $url=implode("?",$separados); $table_results[$n] = array ( 'weight' => $weight, 'img_tag' => '<img border="0" src="'.WEIGHT_IMGSRC.'" width="'.ceil(WEIGHT_WIDTH*$weight/100).'" height="'.WEIGHT_HEIGHT.'" alt="" />', 'page_link' => "<a class=\"phpdig\" href=\"".$url."\" onmousedown=\"return clickit(".$n.",'".$js_url."')\" target=\"".LINK_TARGET."\" >".$title."</a>", 'limit_links' => phpdigMsg('limit_to')." ".$l_site.$l_path, 'filesize' => sprintf('%.1f',(ereg_replace('.*_([0-9]+)$','\1',$content['md5']))/1024), 'update_date' => ereg_replace('^([0-9]{4})[-]?([0-9]{2})[-]?([0-9]{2}).*',PHPDIG_DATE_FORMAT,$content['last_modified']), 'complete_path' => $url, 'link_title' => $title ); |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
accent ai ai ai | jim | Troubleshooting | 0 | 12-30-2005 05:46 AM |
Nor spaces nor accent | pepevilluela | Troubleshooting | 2 | 05-06-2004 05:55 PM |