PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > How-to Forum

Reply
 
Thread Tools
Old 05-22-2004, 07:00 PM   #1
2wheelin
Green Mole
 
Join Date: Apr 2004
Posts: 3
Question QUESTION: How-to Spider Multiple URL's, not just one at a time.

Is there any way to spider Multiple URL's instead of all the pages in a single URL?

I need to index subject specific web sites, the top or index page of each site only. If phpDig will do this, please tell me how to do it or better yet, direct me to an existing thread with this info.

THANKS!
2wheelin is offline   Reply With Quote
Old 05-23-2004, 06:07 PM   #2
bloodjelly
Purple Mole
 
Join Date: Dec 2003
Posts: 106
Multiple spidering seems like one of the most requested features. As of right now, to do what you want, you could try the wrapper on this board (search "wrapper") and use the limit total number of links per site mod that I posted (search "limit"). Or, Charter's current 1.8.1 alpha version apparently can do this as well, but multiple spiders aren't yet supported.
bloodjelly is offline   Reply With Quote
Old 05-24-2004, 07:53 AM   #3
2wheelin
Green Mole
 
Join Date: Apr 2004
Posts: 3
Thanks

Thanks bloodjelly!

I'll check it out.
2wheelin is offline   Reply With Quote
Old 06-13-2004, 09:31 PM   #4
misterbearcom
Green Mole
 
Join Date: Apr 2004
Location: Cali
Posts: 10
Would this work?

I was wondering if you simply could simply create several copies of phpDig and run them simultaneously as separate spiders, but before you start digging to assign the site_ids and spider_ids to numbers that will not overlap each other.

So, if you had database one with the following:

site_id starts at 1
spider_id starts at 1

Then database 2:

site_id starts at 10,000,000.
spider_id starts at 10,000,000.

etc. until you've made as many spiders as you wanted.

Then run them all simultaneously.

Afterwards, dump the info for all of these databases into one final version of the phpdig database and transfer all the txt_content .txt files into the final site folder.

Would that work? I haven't personally tried it so I guess I should try it before suggesting it. In anycase, hopefully there will be further modifications to PHPDIG in the future which will remove this issue.

(My apologies if I sound ignorant. I'm the first one who will probably admit that I am. )
misterbearcom is offline   Reply With Quote
Old 06-13-2004, 11:42 PM   #5
bloodjelly
Purple Mole
 
Join Date: Dec 2003
Posts: 106
Hi misterbear -

We definitely need a way to run multiple spiders, but I think running separate versions on separate databases is sort of like hiring two slow typists and paying them double instead of just hiring a fast one and paying him the regular wage. What I mean to say is, it doesn't really solve the problem. I think the solution will involve one database, and multiple running spider processes that are smart enough to know which sites are being spidered already. In this way, we can limit redundant data and tables, and simplify the process for people that are allowed only one database with their host (for example, freeservers). But thanks for trying to work out solutions!
bloodjelly is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Infos about multiple spider change noel How-to Forum 2 11-11-2005 05:16 PM
Spiders site first time, but update doesn't spider Ensim Troubleshooting 6 11-30-2004 02:17 PM
Spider doesn't work for first time sbrinkmann Script Installation 1 09-07-2004 04:34 PM
Spidering multiple URL's 2wheelin Mod Requests 0 05-22-2004 06:51 PM
Spider & time limit onlytrue How-to Forum 1 04-16-2004 06:03 AM


All times are GMT -8. The time now is 03:45 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.