PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Bug Tracker

Reply
 
Thread Tools
Old 09-22-2003, 10:53 AM   #1
Rolandks
Purple Mole
 
Rolandks's Avatar
 
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
Indexing all HTML-Comments PHP > 4.3.2

Win 2003 and IIS 6 don't like PhpDig

If you are indexing a Win 2003 IIS 6 Site and PhpDig runs at Win 2003 Server he is indexing ALL HTML-Comments:
Code:
<!--LayoutTable-->
  <tr>
    <td width="10" height="114">&nbsp;</td>
    <td width="10">&nbsp;</td>
    <td width="675">&nbsp;</td>
    </tr>
  <tr>
<!--Layout Empty Cell-->
<td height="164">&nbsp;</td>
    <td colspan="2" valign="top">
"LayoutTable" and "Layout" "Empty" and "Cell" are in Keywords-Table.

If you are indexing the same Win 2003 IIS 6 Site and PhpDig runs at Linux or Win 2000 Server he is NOT indexing HTML-Comments !!

Any Ideas - where is the php-code which exclude HTML-Comments - hhm, i don't found it ....

-Roland-
Ps.: This fix is included: http://www.phpdig.net/showthread.php?s=&threadid=67

Last edited by Rolandks; 10-02-2003 at 08:28 AM.
Rolandks is offline   Reply With Quote
Old 09-30-2003, 05:54 AM   #2
Rolandks
Purple Mole
 
Rolandks's Avatar
 
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
Re: Again Win 2003: indexing all HTML-Comments

Quote:

Any Ideas - where is the php-code which exclude HTML-Comments - hhm, i don't found it ....
I think it is perhaps the same wrong \r\n - Bug as on Thread before, but i don't find the php-code which general exclude all HTML-Comments

In the Text-content files TXT all < HTML-Comments are > are < > Example : 14.txt:
Code:
< Navigations-Table end > And here is the Real text 
from the Page < Table-Image begin > Real Text from page......
Ps: I found it:
robot_functions.php Line 156
Code:
//f..k <!SOMETHING tags !!
$text = eregi_replace('(<)!([^-])','\1\2',$text);
But why should this not work on Win 2003 ?

Thanks
-Roland-

Last edited by Rolandks; 10-01-2003 at 01:30 AM.
Rolandks is offline   Reply With Quote
Old 10-02-2003, 06:22 AM   #3
Rolandks
Purple Mole
 
Rolandks's Avatar
 
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
Re: Re: Again Win 2003: indexing all HTML-Comments

Quote:
[i]I found it:
robot_functions.php Line 156
Code:
//f..k <!SOMETHING tags !!
$text = eregi_replace('(<)!([^-])','\1\2',$text);
I find found out, this string above is OK and same on all Servers and all PHP-Versions.
The following Line "kills" the Comments, but NOT with PHP 4.3.2 at Win 2003:
Code:
//replace any group of blank characters by an unique space
$text = ereg_replace("[[:blank:]]+"," ",strip_tags($text));
I think "strip_tags" doesn'´t work. for this: "< Navigations-Table end > And here is the Real text from the Page < Table-Image begin > Real Text from page.."


-Roland-
Rolandks is offline   Reply With Quote
Old 10-02-2003, 08:13 AM   #4
Rolandks
Purple Mole
 
Rolandks's Avatar
 
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
PHP Bug #25730 : ereg_replace or strip_tags unexpected result:
Quote:
This is quite expected behaviour. The SGML specification doesn't allow
whitespaces to appear right after the less than sign.

see: http://bugs.php.net/bug.php?id=25730
//replace any group of blank characters by an unique space
$text = ereg_replace("[[:blank:]]+"," ",strip_tags($text));

Can anyone change this to SGML-Conform - it must fix for the future because never works with PHP > 4.3.2 !!

Thanks
-Roland-
PS.: I Change Headlline of this Thread: its ALL OS !

Last edited by Rolandks; 10-02-2003 at 08:29 AM.
Rolandks is offline   Reply With Quote
Old 10-04-2003, 07:38 AM   #5
Rolandks
Purple Mole
 
Rolandks's Avatar
 
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
The following must work as possible solution:

Change this in robot_functions.php Line 160:
Code:
//replace any group of blank characters by an unique space
$text = ereg_replace("[[:blank:]]+"," ",strip_tags($text));
to
Code:
//replace any group of blank characters by
$text = preg_replace('/<.*>/U', '', $text);
Hope it works for all OS and all PHP-Versions
Rolandks is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Multi-line HTML comments incorrectly being indexed nicrodgers Troubleshooting 0 12-22-2004 03:32 AM
v.1.8.5 member comments vinyl-junkie Feedback & News 8 12-16-2004 05:25 AM
v.1.8.4 member comments vinyl-junkie Feedback & News 2 12-08-2004 03:10 PM
Phpdig indexing including HTML in results Mrsoft Troubleshooting 1 09-28-2004 05:23 AM
need help: phpdig suddenly reads html-comments! manute Troubleshooting 28 01-19-2004 06:25 PM


All times are GMT -8. The time now is 07:17 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.