![]() |
Extracting H2 tag
Hi,
I have added the code you suggested to the robot_functions.php to pull the h2 tag instead of the title tag. It works but the problem is that it is pulling both the first and second h2 tags. This is the code i pasted in: //extracts title if (preg_match_all('/< *h2 *>(.*?)< *\/ *h2 *>/is',$text,$regs,PREG_SET_ORDER)) { // assumes there are at least three h2 tags $title = trim($regs[0][1]." ".$regs[1][1]." ".$regs[2][1]); } else { $title = ""; } The results is showing " Contact UsContact Us" On this page there are 2 h2 tags. http://dobleweb1.doble.com/contactus/ but i only want to show the second one. Any suggestions? Thanks, -Marc |
If you only want the second H2 tag try:
Code:
$title = trim($regs[1][1]); Code:
$title = trim($regs[0][1]." ".$regs[1][1]." ".$regs[2][1]); |
When i try that. It brings up "Untitled" and "search.php" for most of them
http://doble.phpslave.com/search.php -Marc |
Are you using the following?
Code:
if (preg_match_all('/< *h2 *>(.*?)< *\/ *h2 *>/is',$text,$regs,PREG_SET_ORDER)) { |
Here is the code
//extracts title if (preg_match_all('/< *h2 *>(.*?)< *\/ *h2 *>/is',$text,$regs,PREG_SET_ORDER)) { $title = trim($regs[1][1]); } else { $title = ""; } |
Keep that code and increase CHUNK_SIZE in the config file, maybe 4096 will do. If not, try another increase so to get the two H2 tags in the same chunk.
|
That seems to have done the trick!
Thanks, -Marc :o |
All times are GMT -8. The time now is 10:29 AM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.