PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > How-to Forum

Reply
 
Thread Tools
Old 12-14-2004, 03:31 AM   #1
beesman
Green Mole
 
Join Date: Dec 2004
Posts: 2
Question limit search to contents of HTML tags?

Hi all,

I'm testing PhpDig for the first time, & while this forum is a great resource, having trawled through all the messages I can't find a solution to my problem, so any help would be greatly appreciated.

Say I have a number of HTML files with the same structure, e.g. articles with a title in <h2></h2> tags, sub-heading in <h3></h3> tags & the main content in <p class="main"></p> paras. Is it possible to set up PhpDig so that, for example, users can query title text only? Or is there an indexing solution to this issue?

Thanks in advance.
beesman is offline   Reply With Quote
Old 12-14-2004, 04:24 AM   #2
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
Welcome to the forum, beesman.

Searching by title within a page is not something can phpdig was designed to do. I don't know how much interest there would be in doing so, but phpdig could probably be easily modified to search by web page titles, but that's probably the only type of change like this that Charter would be willing to make.

Hope this helps.
vinyl-junkie is offline   Reply With Quote
Old 12-14-2004, 07:24 AM   #3
beesman
Green Mole
 
Join Date: Dec 2004
Posts: 2
Hi, & thanks for the speedy reply

Just say no if it's a request too far, but could you point me in the direction of the relevant file &/or chunk of code that I'd have to play with?

Many thanks
beesman is offline   Reply With Quote
Old 12-14-2004, 07:02 PM   #4
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
Look at admin/spider.php. That is probably what you'd need to modify to make the kind of search index you want.

Hope this helps.
vinyl-junkie is offline   Reply With Quote
Old 12-15-2004, 03:41 AM   #5
Spider
Green Mole
 
Join Date: Jul 2004
Posts: 13
@beesman: you are one day ahead of me posting this question. I will look in to it, but I'm a php-newbie. If you or somebody writes the solution I like to use it too.
Spider is offline   Reply With Quote
Old 12-15-2004, 03:53 AM   #6
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Look at the phpdigCleanHtml function in the robot_functions.php file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-15-2004, 08:05 AM   #7
Spider
Green Mole
 
Join Date: Jul 2004
Posts: 13
I placed this in robot_functions.php at line 161

Code:
$text = eregi_replace("<td[^>]*>.*</td>"," ",$text);
Because all my content is between td-tags, I thought then phpdig would show me nothing. But phpdig still finds everything. Did I make a mistake?
Spider is offline   Reply With Quote
Old 12-15-2004, 08:23 AM   #8
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Did you reindex?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-15-2004, 09:06 AM   #9
Spider
Green Mole
 
Join Date: Jul 2004
Posts: 13
Yes, emptied the database and reindexed.
Spider is offline   Reply With Quote
Old 12-15-2004, 09:17 AM   #10
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
So you have the following?
PHP Code:
$text eregi_replace("<td[^>]*>.*</td>"," ",$text);
$text preg_replace("/<[\/\!]*?[^<>]*?>/is"," ",$text); 
The first removes stuff between <td...> and </td> (according to CHUNK_SIZE) and the second removes other tag-like things, so you don't really need the first one. If you want to exclude part of a page, look at this thread or look at how $title is set in the phpdigCleanHtml function in the robot_functions.php file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-15-2004, 12:41 PM   #11
Spider
Green Mole
 
Join Date: Jul 2004
Posts: 13
Thanks Charter, the phpdigExclude and phpdigInclude does it for me! I didn't see that function till now.

Spider is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom XML like tags...? BulForce How-to Forum 1 07-18-2005 05:56 PM
Limit search to this path returns no results mixle Bug Tracker 2 05-30-2005 11:24 PM
How to make phpdig index certain content, located in certain html tags?! r3m How-to Forum 1 11-18-2004 06:27 PM
Add search depth limit to the sites table peter Mod Requests 0 01-03-2004 10:14 PM
Search box on html page rafarspd How-to Forum 4 12-04-2003 07:37 AM


All times are GMT -8. The time now is 09:01 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.