PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 08-28-2006, 03:24 AM   #1
pepevilluela
Green Mole
 
Join Date: Apr 2004
Posts: 7
Exclamation accent in links

PHPDig is not indexing links with accents. I'm using apache in windows XP(XAMPP from apachefriends) and I've setted PHPDig 1.8.8 in spanish (es).

Example link: http://localhost/Informatica/Documen...nto/index.html

Pay attention to the word código and the accent.

Microsoft explorer gets this page. PHPDig no. Answer is 403 Forbidden.

I have seen that Microsoft explorer changes ó (oacute) for %C3%B3, instead of PHP fputs, that send %B3 only.

I've tried some code just in http request, like

$pathant=$path;
$separados=explode("?",$path,2);
$separados[0]=str_replace("%3A",":",str_replace("%2F","/",urlencode(utf8_encode($separados[0]))));
$path=implode("?",$separados);
//complete get
$request =
"HEAD $path $http_scheme/1.1".END_OF_LINE_MARKER
."Host: $host$sport".END_OF_LINE_MARKER
.$cookiesSendString
.$auth_string
."Accept: */*".END_OF_LINE_MARKER
."Accept-Charset: ".PHPDIG_ENCODING.END_OF_LINE_MARKER
."Accept-Encoding: identity".END_OF_LINE_MARKER
."Connection: close".END_OF_LINE_MARKER
."User-Agent: PhpDig/".PHPDIG_VERSION." (+http://www.phpdig.net/robot.php)".END_OF_LINE_MARKER.END_OF_LINE_MARKER;
$path=$pathant;


and I have not the error 403 Forbidden,

but then spider stops with "No links in temporary table"

Last edited by pepevilluela; 08-28-2006 at 03:28 AM.
pepevilluela is offline   Reply With Quote
Old 08-29-2006, 02:56 AM   #2
pepevilluela
Green Mole
 
Join Date: Apr 2004
Posts: 7
accents solved

I have solved my problem with accents in links. I'm spanish and I use accent in links and ñ and Ñ. I think I've fixed the problem replacing the line 218 in robot_functions.php:

$eval = str_replace(" ","%20",$eval);

with

$separados=explode("?",$eval,2);
$separados[0]=str_replace("%25","%",str_replace("%3A",":",str_replace("%2F","/",rawurlencode(utf8_encode($separados[0])))));
$eval=implode("?",$separados);

and changing in config.php the variable allowed_link_chars:

$allowed_link_chars = "[:%/?=&;\\,._a-zA-Z0-9áÁéÉ*ÍóÓúÚüÜñÑ|+ ()~-]*"; // includes space and () - not good with javascript, y acentos y guiones

and forget the previous post.

This can solve any special character, just including it in $allowed_link_chars.

I hope this help other spanish and not english people.

Last edited by pepevilluela; 08-29-2006 at 02:59 AM.
pepevilluela is offline   Reply With Quote
Old 08-29-2006, 04:02 AM   #3
pepevilluela
Green Mole
 
Join Date: Apr 2004
Posts: 7
Results page

Don't forget change search_function.php in line 518 or links will be wrong:

$timer->stop('Extracts');

$separados=explode("?",$url,2);
2F","/",rawurlencode(utf8_encode($separados[0])))));
$separados[0]=utf8_decode($separados[0]);
$url=implode("?",$separados);

$table_results[$n] = array (
'weight' => $weight,
'img_tag' => '<img border="0" src="'.WEIGHT_IMGSRC.'" width="'.ceil(WEIGHT_WIDTH*$weight/100).'" height="'.WEIGHT_HEIGHT.'" alt="" />',
'page_link' => "<a class=\"phpdig\" href=\"".$url."\" onmousedown=\"return clickit(".$n.",'".$js_url."')\" target=\"".LINK_TARGET."\" >".$title."</a>",
'limit_links' => phpdigMsg('limit_to')." ".$l_site.$l_path,
'filesize' => sprintf('%.1f',(ereg_replace('.*_([0-9]+)$','\1',$content['md5']))/1024),
'update_date' => ereg_replace('^([0-9]{4})[-]?([0-9]{2})[-]?([0-9]{2}).*',PHPDIG_DATE_FORMAT,$content['last_modified']),
'complete_path' => $url,
'link_title' => $title
);
pepevilluela is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
accent ai ai ai jim Troubleshooting 0 12-30-2005 05:46 AM
Nor spaces nor accent pepevilluela Troubleshooting 2 05-06-2004 05:55 PM


All times are GMT -8. The time now is 07:21 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.