|
02-22-2004, 06:31 PM | #1 |
Green Mole
Join Date: Feb 2004
Posts: 3
|
An easy way to boost PhpDig ?
Hi,
Since 2 days I try to dig 20.000 html pages on my database that already contains close to 6.000.000 records in phpdig_engine table. Unfortunately ... it last a very very long time (it could last more than 1 minute to dig a unique html file as it took around 3 seconds 2 weeks ago !!!). After several investigation concerning my system (XP) my database limits (Mysql), my folder size (20.000 html files) ... I've found the solution. I hope it could help somebody else. Using time-tracker function I've discover that the time consuming code is the "Optimizing phase" of the spider.php file (PhpDig V1.8.0). As a result ... just comment this 4 lines, integrate another optimizing process your own way (every 5000 digs for example) and enjoy with your new boosted Phpdig. === Code to comment in spider.php //print "Optimizing tables...".$br; //@mysql_query("OPTIMIZE TABLE ".PHPDIG_DB_PREFIX."spider",$id_connect); @mysql_query("OPTIMIZE TABLE ".PHPDIG_DB_PREFIX."engine",$id_connect); @mysql_query("OPTIMIZE TABLE ".PHPDIG_DB_PREFIX."keywords",$id_connect); Remarks: I'm only using PhpDig for inserting new html files. There is no update, no delete. By the way the original PhpDig optimization phase is less important for me. Regards. tibabs. |
02-22-2004, 06:33 PM | #2 |
Green Mole
Join Date: Feb 2004
Posts: 3
|
Ooopsss ... I've miss to comment 2 lines in my post.
=== Code to comment in spider.php //print "Optimizing tables...".$br; //@mysql_query("OPTIMIZE TABLE ".PHPDIG_DB_PREFIX."spider",$id_connect); //@mysql_query("OPTIMIZE TABLE ".PHPDIG_DB_PREFIX."engine",$id_connect); //@mysql_query("OPTIMIZE TABLE ".PHPDIG_DB_PREFIX."keywords",$id_connect); Regards, tibabs. |
02-25-2004, 06:33 AM | #3 |
Orange Mole
Join Date: Sep 2003
Posts: 40
|
I have the same problem.
I've started indexing many sites and noticed a slowing down of the engine.My engine table contains 1/2 million records. Every page indexing lasts several seconds (up to 1 min) But your solution is strange because optimization is made only once at the end of site spidring. Any suggestion? |
02-25-2004, 05:24 PM | #4 |
Green Mole
Join Date: Feb 2004
Posts: 3
|
Hi,
On my way I'm only indexing differents html pages that are not necessary linked together. It seems, as you said that the optimization phase is done only once ... but ... What I can suggest to you Sol #1) Try to comment the optimization phase and have a look to the result ==> 5 minutes Sol #2) Use the phpdigTimer class to profile the source or use other profiling functions such as http://www.pear.php.net/package/Benchmark. ==> 1 hour I thing that you can quick discover where from is coming the trouble quite quickly (1 hour) and afterwords to fix it. Regards, Thierry |
02-26-2004, 12:41 AM | #5 | |
Orange Mole
Join Date: Sep 2003
Posts: 40
|
Quote:
we have different problems. I have to index large web sites, you have to index many single pages. Optimization is run at the end of each spidering even if indexing a single page. Maybe it's possible to add a flag in the config.php to disable automatic optimization and run it manually from the admin page. If you do that hack post it here |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Easy way to add most & last searched queries to web page? | guinessec | How-to Forum | 0 | 12-01-2004 12:08 PM |
very easy | LogicMan | Feedback & News | 1 | 09-14-2004 09:16 PM |
Easy RegExp Trivia | Charter | The Mole Hole | 1 | 10-27-2003 11:26 AM |