|
10-09-2003, 07:19 AM | #1 |
Green Mole
Join Date: Oct 2003
Location: Mesa, AZ
Posts: 15
|
Typical run times...
Yes, I know message boards have search features built in to them...
Nevertheless, we have been setting up phpdig and as a test have had it spider several message boards hosted on our server. One such board has about 9000 posts, and I realize, probably a lot of links that loop round and round... I set recursion at 2 and let phpdig go. 16 hours later it was still at it! Is this typical? What are some runtimes some of you have experienced, and on what size of a site? Not necessarily looking for other message board crawling times, just anything in general that I can compare against. Since some of these sites are "our" sites and on a local connection, it might be prudent to remove the sleep(2) call in spider.php to speed things up... |
10-09-2003, 04:43 PM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. You might try using the PhpDig include and exclude comments for the header and footer, and if not already done, try running PhpDig from shell rather than from a browser.
Another idea, off the top of my head, would be to write a quick script to port the post URLs to a file, and then just crawl that file at level one.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
11-19-2003, 02:17 PM | #3 |
Orange Mole
Join Date: Nov 2003
Posts: 69
|
For those sites that take a looong time to index, it might be nice to have interruptible indexing (stop for a while, I'll tell you when to continue) - but that's a mod request and should be placed elsewhere I guess?
Anyway, if I can influence the design of the site to be indexed, what advice can I get from the gurus? What are the DOs and the DONTs for quick indexing? Where does PhpDig loose a lot of time when indexing sites? ...
__________________
René Haentjens, Ghent University |
11-19-2003, 02:20 PM | #4 |
Green Mole
Join Date: Oct 2003
Location: Mesa, AZ
Posts: 15
|
What I ended up doing was breaking my list of URL's into 7, 8, 9 or 10 sublists and then starting a crawler for each of them.
Pseudo-threading! Doesn't make any individual site crawl faster, but the whole gets completed quicker. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Could not run set names. | gle76130 | Script Installation | 1 | 04-11-2005 11:32 AM |
Search times and speed | Dave A | The Mole Hole | 6 | 03-20-2005 10:59 AM |
why script can run all time ? | fr :: anonymus | The Mole Hole | 3 | 12-10-2003 09:25 AM |
Run at PHP 4.3.2 MySQL 4 | Rolandks | Troubleshooting | 4 | 09-18-2003 05:36 AM |
2 Linux-related articles from today's NY Times | maggiemel | The Mole Hole | 0 | 08-05-2003 05:14 AM |