|
06-04-2004, 09:20 AM | #1 |
Green Mole
Join Date: Mar 2004
Posts: 22
|
Optimized search (around 60% faster)
Hi,
I have build a search engine for my job (I couldn't use phpdig since set_time_limit and allow_url_fopen where not available on the server) but I took a lot of idea from PHPDig. I found some great optimization to do. The idea is to do all the job on the Database since it's faster than php for handling big amount of data. exact phrase is not implemented (now) so it just use the original phpdig search function. There're also some things that change, not because of the optimization but because of different way to think search : There're different search type (you can specify it as a last parameter of other_phpdigSearch): - Exact only search for exact (same) words are counted as 100% word - Normal is exact and words that have one or two letters more are counted as 80% of a word - Fuzzy is Normal and words that soundex like the searched one are counted as 40% of a word so here it is for PHPDig, all you need to do is - put other_search.php where you put your search.php file (and don't forget to change the SEARCH_PAGE in your config file to other_search.php). If you have changed it or didn't use this one just take a look at other_search.php only few lines were added. - put other_search_function.php and class.queryparser.php in the phpdig's libs directory. That's all ! Here are some benchmark done to compare Searching for two words on a site containing 3500 documents the both give back around 2100 results PHPDig 0.55 seconds (average) Other 0.20 seconds (average) So here's a more complete exemple for PHPDig : Mark Value All 1.330961943 s. All Backend 0.423772812 s. Parsing Strings 0.002659082 s. Spider Queries 0.221775055 s. Spider Fills 0.108201981 s. Reorder Results 0.090033054 s. All Display 0.907027960 s. Result Table 0.107897997 s. Display Queries 0.021100283 s. Extracts 0.069858074 s. Final Strings 0.000780106 s. Logs 0.004404068 s. Template Parsing 0.783947945 s. And here for other one Mark Value All 0.257488012 s. All Backend 0.104629993 s. Parsing Strings 0.013200045 s. Spider Queries 0.091212034 s. All Display 0.152703047 s. Result Table 0.089221001 s. Display Queries 0.010927677 s. Extracts 0.070713758 s. Final Strings 0.000638008 s. Logs 0.000868082 s. Template Parsing 0.060055017 s. I don't know why there's a difference in Template parsing it's the same and I didn't touch the PHPDig search file.... Maybe it's because of memory issue. If you have some benchmark on bigger site I would be interested The big difference you must see it's in Backend (where we perform the search). Ask if you encounter any trouble and/or have suggestions, questions.... Note that more optimization can be done with MySQL version greater than 4.00 and speed it up much more. Last edited by synnalagma; 06-04-2004 at 09:27 AM. |
06-04-2004, 06:16 PM | #2 | |
Purple Mole
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
|
Quote:
Tell, me, how long do you or phpdig need to spider sites containing 3500 documents I hope anyone spend some time to found some great optimization for this -roland- |
|
06-05-2004, 02:53 AM | #3 | |
Green Mole
Join Date: Mar 2004
Posts: 22
|
Quote:
If you put more than two words the difference will be bigger. But i'm agree with you, spidering process is too long. |
|
06-06-2004, 01:18 AM | #4 |
Green Mole
Join Date: Mar 2004
Posts: 22
|
Results numbers are wrong this is just a small mistake in the code
Just comment this line (around line 193) : PHP Code:
|
06-10-2004, 12:05 AM | #5 |
Green Mole
Join Date: Feb 2004
Location: Germany - Münster
Posts: 13
|
Hi,
thank you, but why you don't habe insert the "NUMBER_OF_RESULTS_PER_SITE" - function in your new search script? |
06-10-2004, 03:40 AM | #6 |
Green Mole
Join Date: Mar 2004
Posts: 22
|
New version
Hi,
You're right I totally forgot about this NUMBER_OF_RESULTS_PER_SITE thing since I don't use it. So now it's included I've also made some change : - You can search with unix wildcards ( * and ? ) - You can show wich word where searched for - If one word isn't found now it ignore it (for and condition) and propose another word Installation is always the same except for one thing you can specify some parameter in other_search_function.php (beginning of file). If you want to show wich words where searched (but's that's more like a debug function) change the define to the number of words to show. If you want to allow or disallow wildcard search you can do it here also PS : charter can you remove the first version please. |
06-10-2004, 03:47 AM | #7 |
Green Mole
Join Date: Feb 2004
Location: Germany - Münster
Posts: 13
|
thx synnalagma
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Optimized Search Mod | khoisan | Mod Submissions | 1 | 11-10-2006 01:20 AM |
How can I make phpdig spider faster | jakeres | How-to Forum | 1 | 11-29-2004 11:05 AM |