|
05-22-2004, 07:00 PM | #1 |
Green Mole
Join Date: Apr 2004
Posts: 3
|
QUESTION: How-to Spider Multiple URL's, not just one at a time.
Is there any way to spider Multiple URL's instead of all the pages in a single URL?
I need to index subject specific web sites, the top or index page of each site only. If phpDig will do this, please tell me how to do it or better yet, direct me to an existing thread with this info. THANKS! |
05-23-2004, 06:07 PM | #2 |
Purple Mole
Join Date: Dec 2003
Posts: 106
|
Multiple spidering seems like one of the most requested features. As of right now, to do what you want, you could try the wrapper on this board (search "wrapper") and use the limit total number of links per site mod that I posted (search "limit"). Or, Charter's current 1.8.1 alpha version apparently can do this as well, but multiple spiders aren't yet supported.
__________________
Foundmyself.com artist community, art galleries |
05-24-2004, 07:53 AM | #3 |
Green Mole
Join Date: Apr 2004
Posts: 3
|
Thanks
Thanks bloodjelly!
I'll check it out. |
06-13-2004, 09:31 PM | #4 |
Green Mole
Join Date: Apr 2004
Location: Cali
Posts: 10
|
Would this work?
I was wondering if you simply could simply create several copies of phpDig and run them simultaneously as separate spiders, but before you start digging to assign the site_ids and spider_ids to numbers that will not overlap each other.
So, if you had database one with the following: site_id starts at 1 spider_id starts at 1 Then database 2: site_id starts at 10,000,000. spider_id starts at 10,000,000. etc. until you've made as many spiders as you wanted. Then run them all simultaneously. Afterwards, dump the info for all of these databases into one final version of the phpdig database and transfer all the txt_content .txt files into the final site folder. Would that work? I haven't personally tried it so I guess I should try it before suggesting it. In anycase, hopefully there will be further modifications to PHPDIG in the future which will remove this issue. (My apologies if I sound ignorant. I'm the first one who will probably admit that I am. ) |
06-13-2004, 11:42 PM | #5 |
Purple Mole
Join Date: Dec 2003
Posts: 106
|
Hi misterbear -
We definitely need a way to run multiple spiders, but I think running separate versions on separate databases is sort of like hiring two slow typists and paying them double instead of just hiring a fast one and paying him the regular wage. What I mean to say is, it doesn't really solve the problem. I think the solution will involve one database, and multiple running spider processes that are smart enough to know which sites are being spidered already. In this way, we can limit redundant data and tables, and simplify the process for people that are allowed only one database with their host (for example, freeservers). But thanks for trying to work out solutions!
__________________
Foundmyself.com artist community, art galleries |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Infos about multiple spider change | noel | How-to Forum | 2 | 11-11-2005 05:16 PM |
Spiders site first time, but update doesn't spider | Ensim | Troubleshooting | 6 | 11-30-2004 02:17 PM |
Spider doesn't work for first time | sbrinkmann | Script Installation | 1 | 09-07-2004 04:34 PM |
Spidering multiple URL's | 2wheelin | Mod Requests | 0 | 05-22-2004 06:51 PM |
Spider & time limit | onlytrue | How-to Forum | 1 | 04-16-2004 06:03 AM |