|
09-18-2006, 05:33 PM | #1 |
Purple Mole
Join Date: Aug 2004
Location: North Island New Zealand
Posts: 170
|
Forward thinking??
Well I started out with the basic PHPDIG a few years ago ( Three years ) and I have made a heap of changes to most of it, added a few extras along the way and so far we now have an index of 110 million pages running through five servers in a cluster. Each has quad P4 processors with 16gig of RAM and around 2048 gig of drive space in an array.
Our original dedicated server is still running with around a gig of ram and a few hundred gig of drives and it has a single P4, nowdays it does most of our indexing and just sits in the datacentre eating a heap of bandwith. We have also just started hosting (Using the spare space) and we also have an Authoritive DNS running so that we that we can host the sites and do all the usual email redirections and stuff. It started out as 60mb of space on a shared host, until the volume of calls to our Hosts database started to stretch the system a bit (GRIN) and they guilded us over to a dedicated server and we have sort of expanded a heap since then. Charter in the early days was brilliant with the support he offered, often he would often drop in via ftp and play with our code for me, the guy was brilliant. I would imagine that to index 23,000 sites would be easy even for the phpdig out of the box but make sure the database is optimised all the time and it would be best to set it up on a decent dedicated server, you would need at least a couple of gig of ram so that it can run it from memory, otherwise calls to the disks would slow it down to a crawl. A while ago we split the database and developed an algo to enable us to rank webpages using a basic content based algorythm which we have expanded on a heap. When a search is now done it is passed through two databases and the results are then pooled to form the results that we give. One server does all the indexing, otherwise it slows the system down and when chron jobs run to backup the systems, search times do slow down a bit. I would love to see the PHPDIG sofware developed further, a few months back I set up a small country specific search engine with it in Rumania for a friend and he thinks it's brilliant. That has indexed around 200,000 pages and is straight out of the box with no mods to it. When he asked the price I said well you go and download it because it's free under the GPL scheme and he thought I was joking!!!! The work started all those years ago by Antoine Bajolet and then added to by Charter has been brilliant and I would hate to see the forum be ruined by spammers leaving crazy messages on it. I now have a web directory located at www.linkoz.co.nz and I have setup a small forum on the side of it so that users can ask questions about the search engine and the web directory, and if I can be of any help with the software then don't hesitate to contact me. PHPDIG can never be a money making venture and we must respect the fact that Charter has to earn a living, so he can't spend all his time helping everyone out with problems. |