|
06-06-2004, 09:39 AM | #1 |
Green Mole
Join Date: Jun 2004
Posts: 4
|
Indexing the Internet
I'm a bit concerned to Google's domination on search, so like many others I signed up to Grub.org.
Unfortunately, their server software is not open-sourced, so I looked around again to find another similar project. ***** is a good search engine and it's open-source, however they're not interested on implementing distributed crawler like Grub. And I don't know Java So I was looking for a good PHP-based search engine, and found PhpDig. I just installed it, and it looks quite good. I'm very interested to start a project to index the Internet using PhpDig. I think we can scale PhpDig for this, example: we can separate the various components (indexer, search front-end, database, etc) into multiple physical servers for each component, MySQL have clustering feature now, etc. If anyone else's interested, feel free to join in. This is the to-do list for this project: # Purchase a dedicated server for the project # Get domain names list by signing up [ here ] and [ here ] (read [ this ] and [ this ] for details) # Code a job allocator, which will allocate job packages to users. It will assign several domain names (from the list above) to be deep-crawled by users. # Code a job manager, which will receive submission from users, and merge it to the main index. # Modify spider.php to be able to request job packages (with user authentication), crawl the domains, and submit the result back securely. (running as php cgi) # Create a simple website; with basic stats, user management, and search front-end. That should be enough to get this project off the ground. This project will be fully open and strictly non-profit. Thanks for the PhpDig developers, and here's hoping that this will be useful for everyone as well. Thanks, Harry |
06-06-2004, 09:58 AM | #2 |
Purple Mole
Join Date: Dec 2003
Posts: 106
|
Sounds like an ambitious task, sufehmi, to put it mildly--especially considering that as of now phpDig can only spider one site at a time per database. Also, and no offense, but why would you want to do this? Google's "domination" on search, most people agree, provides relevant information quickly and easily, giving useful results in a fair manner. Do you plan on out-Googling Google? Everywhere you look some search engine is trying to top them, and they're spending millions and millions of dollars to do it. If they do, great, just as long as we still get relevant information. That's the only thing people want. So I guess my question is why would you want to compete with these big businesses, and why would you want to do something that many many other people have already almost done? I'm personally happy with my search results thus far.
__________________
Foundmyself.com artist community, art galleries Last edited by bloodjelly; 06-06-2004 at 10:03 AM. |
06-06-2004, 01:59 PM | #3 | ||||
Green Mole
Join Date: Jun 2004
Posts: 4
|
Quote:
But if I can drive people's interest to this project, I think this project has a good chance to succeed. Making it very easy to contribute is one of the trick (by enabling them to run the spider) And I think I can get the project off the ground by my own, where hopefully it'll be interesting enough for others to join in. Quote:
Quote:
# At the moment they're doing a great job playing it fair (for most people), but there's no guarantee for the future. # Google is excellent, but there are a few stuff that I (and no doubt others) would like to enhance. (link farm anyone ? Google spammer ? etc) # It will be one mighty interesting project Quote:
But when people are working together, I think nothing is impossible. Thanks, Harry |
||||
06-06-2004, 05:28 PM | #4 |
Purple Mole
Join Date: Dec 2003
Posts: 106
|
Well good luck, let's hear how your project progresses.
__________________
Foundmyself.com artist community, art galleries |
08-03-2004, 05:18 PM | #5 |
Green Mole
Join Date: Aug 2004
Posts: 1
|
I like how you're thinking. Sounds like a great project. Best of luck. Any progress report?
|
08-04-2004, 01:56 PM | #6 |
Green Mole
Join Date: Jun 2004
Posts: 4
|
Nope, unfortunately I'm still busy coding for phpBB and phpOpenChat, among other things.
Well anyway, this gives me opportunity to look for a better server within my budget I can't believe how cheap dedicated server nowadays (as long as you don't host anything business-critical) In the meantime if anyone is interested to join in, just drop me an email or post in this thread. cheers, HS |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Site and Internet Search | twizzlermambo | Coding & Tutorials | 0 | 07-29-2007 06:42 PM |