View Single Post
Old 01-09-2005, 08:11 AM   #3
revenazb
Green Mole
 
Join Date: Jan 2005
Posts: 3
Hi,

Thanks for the welcome.
It is not so much the character set as the site is in english.
The problem is with the url of the page, it contains a lot of comas, colons, tilds etc. Looking at the regular expression that I think phpdig uses it looks like it would not pick up the url I put above. While the individual characters are in there I don't think the pattern would be picked. I am not very good with regular expressions. Im putting the line of code that I think is relevant below.

Regular expression:
Code:
"(<frame[^>]*src[[:blank:]]*=|href[[:blank:]]*=|HREF[[:blank:]]*=|http-equiv=['\"]refresh['\"] *content=['\"][0-9]+;[[:blank:]]*url[[:blank:]]*=|window[.]location[[:blank:]]*=|window[.]open[[:blank:]]*[(])[[:blank:]]*[\'\"]?((([a-z]{3,5}://)+(([.a-zA-Z0-9-])+(:[0-9]+)*))*([:%/?=&;\\,._a-zA-Z0-9\|+ ()~-]*))(#[.a-zA-Z0-9-]*)?[\'\" ]?"
Sample url that was not found by php dig:
Code:
my.intranet.com/WBSITE/INTRANET/UNITS/INTINFNETWORK/0,,contentMDK:20295425~pagePK:64156298~piPK:64152276~theSitePK:489784,00.html
The spaces in ht ml are a typo. the actual url reads html

Last edited by revenazb; 01-09-2005 at 08:47 AM. Reason: Small Typo
revenazb is offline   Reply With Quote