PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Mod Submissions

Reply
 
Thread Tools
Old 06-04-2004, 10:20 AM   #1
synnalagma
Green Mole
 
Join Date: Mar 2004
Posts: 22
Optimized search (around 60% faster)

Hi,

I have build a search engine for my job (I couldn't use phpdig since set_time_limit and allow_url_fopen where not available on the server) but I took a lot of idea from PHPDig.

I found some great optimization to do. The idea is to do all the job on the Database since it's faster than php for handling big amount of data.

exact phrase is not implemented (now) so it just use the original phpdig search function.

There're also some things that change, not because of the optimization but because of different way to think search :

There're different search type (you can specify it as a last parameter of other_phpdigSearch):
- Exact only search for exact (same) words are counted as 100% word
- Normal is exact and words that have one or two letters more are counted as 80% of a word
- Fuzzy is Normal and words that soundex like the searched one are counted as 40% of a word


so here it is for PHPDig, all you need to do is

- put other_search.php where you put your search.php file (and don't forget to change the SEARCH_PAGE in your config file to other_search.php). If you have changed it or didn't use this one just take a look at other_search.php only few lines were added.
- put other_search_function.php and class.queryparser.php in the phpdig's libs directory.

That's all !

Here are some benchmark done to compare

Searching for two words on a site containing 3500 documents the both give back around 2100 results
PHPDig 0.55 seconds (average)
Other 0.20 seconds (average)
So here's a more complete exemple for PHPDig :
Mark Value
All 1.330961943 s.
All Backend 0.423772812 s.
Parsing Strings 0.002659082 s.
Spider Queries 0.221775055 s.
Spider Fills 0.108201981 s.
Reorder Results 0.090033054 s.
All Display 0.907027960 s.
Result Table 0.107897997 s.
Display Queries 0.021100283 s.
Extracts 0.069858074 s.
Final Strings 0.000780106 s.
Logs 0.004404068 s.
Template Parsing 0.783947945 s.

And here for other one
Mark Value
All 0.257488012 s.
All Backend 0.104629993 s.
Parsing Strings 0.013200045 s.
Spider Queries 0.091212034 s.
All Display 0.152703047 s.
Result Table 0.089221001 s.
Display Queries 0.010927677 s.
Extracts 0.070713758 s.
Final Strings 0.000638008 s.
Logs 0.000868082 s.
Template Parsing 0.060055017 s.


I don't know why there's a difference in Template parsing it's the same and I didn't touch the PHPDig search file.... Maybe it's because of memory issue.

If you have some benchmark on bigger site I would be interested

The big difference you must see it's in Backend (where we perform the search).

Ask if you encounter any trouble and/or have suggestions, questions....

Note that more optimization can be done with MySQL version greater than 4.00 and speed it up much more.
Attached Files
File Type: zip phpdig_search.zip (8.4 KB, 105 views)

Last edited by synnalagma; 06-04-2004 at 10:27 AM.
synnalagma is offline   Reply With Quote
Old 06-04-2004, 07:16 PM   #2
Rolandks
Purple Mole
 
Rolandks's Avatar
 
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
Quote:
PHPDig 0.55 seconds
I think that is fast enough

Tell, me, how long do you or phpdig need to spider sites containing 3500 documents I hope anyone spend some time to found some great optimization for this

-roland-
Rolandks is offline   Reply With Quote
Old 06-05-2004, 03:53 AM   #3
synnalagma
Green Mole
 
Join Date: Mar 2004
Posts: 22
Quote:
I think that is fast enough
I don't think so since this results where on my personnal computer and not on the server. So there's only one query at a time and only one site. If you're on a shared server your friends will be happier with this.

If you put more than two words the difference will be bigger.

But i'm agree with you, spidering process is too long.
synnalagma is offline   Reply With Quote
Old 06-06-2004, 02:18 AM   #4
synnalagma
Green Mole
 
Join Date: Mar 2004
Posts: 22
Results numbers are wrong this is just a small mistake in the code

Just comment this line (around line 193) :
PHP Code:
$n=$n_start
It will fix this problem.
synnalagma is offline   Reply With Quote
Old 06-10-2004, 01:05 AM   #5
sktest
Green Mole
 
Join Date: Feb 2004
Location: Germany - Münster
Posts: 13
Hi,

thank you, but why you don't habe insert the "NUMBER_OF_RESULTS_PER_SITE" - function in your new search script?
sktest is offline   Reply With Quote
Old 06-10-2004, 04:40 AM   #6
synnalagma
Green Mole
 
Join Date: Mar 2004
Posts: 22
New version

Hi,

You're right I totally forgot about this NUMBER_OF_RESULTS_PER_SITE thing since I don't use it. So now it's included

I've also made some change :
- You can search with unix wildcards ( * and ? )
- You can show wich word where searched for
- If one word isn't found now it ignore it (for and condition) and propose another word

Installation is always the same except for one thing you can specify some parameter in other_search_function.php (beginning of file).
If you want to show wich words where searched (but's that's more like a debug function) change the define to the number of words to show.

If you want to allow or disallow wildcard search you can do it here also

PS : charter can you remove the first version please.
Attached Files
File Type: zip phpdig_search2.zip (9.4 KB, 167 views)
synnalagma is offline   Reply With Quote
Old 06-10-2004, 04:47 AM   #7
sktest
Green Mole
 
Join Date: Feb 2004
Location: Germany - Münster
Posts: 13
thx synnalagma
sktest is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Optimized Search Mod khoisan Mod Submissions 1 11-10-2006 02:20 AM
How can I make phpdig spider faster jakeres How-to Forum 1 11-29-2004 12:05 PM


All times are GMT -8. The time now is 06:25 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.