|
09-22-2003, 09:53 AM | #1 |
Purple Mole
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
|
Indexing all HTML-Comments PHP > 4.3.2
Win 2003 and IIS 6 don't like PhpDig
If you are indexing a Win 2003 IIS 6 Site and PhpDig runs at Win 2003 Server he is indexing ALL HTML-Comments: Code:
<!--LayoutTable--> <tr> <td width="10" height="114"> </td> <td width="10"> </td> <td width="675"> </td> </tr> <tr> <!--Layout Empty Cell--> <td height="164"> </td> <td colspan="2" valign="top"> If you are indexing the same Win 2003 IIS 6 Site and PhpDig runs at Linux or Win 2000 Server he is NOT indexing HTML-Comments !! Any Ideas - where is the php-code which exclude HTML-Comments - hhm, i don't found it .... -Roland- Ps.: This fix is included: http://www.phpdig.net/showthread.php?s=&threadid=67 Last edited by Rolandks; 10-02-2003 at 07:28 AM. |
09-30-2003, 04:54 AM | #2 | |
Purple Mole
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
|
Re: Again Win 2003: indexing all HTML-Comments
Quote:
In the Text-content files TXT all < HTML-Comments are > are < > Example : 14.txt: Code:
< Navigations-Table end > And here is the Real text from the Page < Table-Image begin > Real Text from page...... robot_functions.php Line 156 Code:
//f..k <!SOMETHING tags !! $text = eregi_replace('(<)!([^-])','\1\2',$text); Thanks -Roland- Last edited by Rolandks; 10-01-2003 at 12:30 AM. |
|
10-02-2003, 05:22 AM | #3 | |
Purple Mole
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
|
Re: Re: Again Win 2003: indexing all HTML-Comments
Quote:
The following Line "kills" the Comments, but NOT with PHP 4.3.2 at Win 2003: Code:
//replace any group of blank characters by an unique space $text = ereg_replace("[[:blank:]]+"," ",strip_tags($text)); -Roland- |
|
10-02-2003, 07:13 AM | #4 | |
Purple Mole
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
|
PHP Bug #25730 : ereg_replace or strip_tags unexpected result:
Quote:
$text = ereg_replace("[[:blank:]]+"," ",strip_tags($text)); Can anyone change this to SGML-Conform - it must fix for the future because never works with PHP > 4.3.2 !! Thanks -Roland- PS.: I Change Headlline of this Thread: its ALL OS ! Last edited by Rolandks; 10-02-2003 at 07:29 AM. |
|
10-04-2003, 06:38 AM | #5 |
Purple Mole
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
|
The following must work as possible solution:
Change this in robot_functions.php Line 160: Code:
//replace any group of blank characters by an unique space $text = ereg_replace("[[:blank:]]+"," ",strip_tags($text)); Code:
//replace any group of blank characters by $text = preg_replace('/<.*>/U', '', $text);
__________________
-Roland- :: Test PhpDig 1.6.2 here :: - :: Test-Search for (little) Intelligent Php-Dig Fuzzy :: |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Multi-line HTML comments incorrectly being indexed | nicrodgers | Troubleshooting | 0 | 12-22-2004 02:32 AM |
v.1.8.5 member comments | vinyl-junkie | Feedback & News | 8 | 12-16-2004 04:25 AM |
v.1.8.4 member comments | vinyl-junkie | Feedback & News | 2 | 12-08-2004 02:10 PM |
Phpdig indexing including HTML in results | Mrsoft | Troubleshooting | 1 | 09-28-2004 04:23 AM |
need help: phpdig suddenly reads html-comments! | manute | Troubleshooting | 28 | 01-19-2004 05:25 PM |