|
02-23-2005, 03:56 PM | #1 |
Green Mole
Join Date: Feb 2005
Posts: 1
|
Problem Spidering...
Hello. I am fairly new to the phpdig thing. I am very happy with it on the whole. I have a couple questions though.
I am currently trying to spider wireworld.com for a client. I have set up a robots.txt file and go ahead and spider the root directory. The issue is this. There are about 1500 files in the root and the spider gets about 200-400 in and then I get all sorts of 404 errors. I am wondering if this is a browser issue? I am running it on Mozilla. I have no experience with shell at all. I don't even know if I can get access to the shell. The server is one and a half hours away. Any suggestions? Thanks. |
02-26-2005, 07:38 PM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Are the 1500 files linked to other pages? PhpDig follows links to index. If you have orphan pages, check this thread. Also, try using search depth set to a large number, links per set to zero, select no, set LIMIT_TO_DIRECTORY to false, and set PHPDIG_IN_DOMAIN to true. Not sure about the 404s as I cannot see the requests. Shell is accessible remotely, assuming you have permission. SSH/Telnet, cPanel, etcetera, can be used to access shell.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Spidering problem | mark40 | Troubleshooting | 1 | 08-28-2007 05:06 AM |
Problem with spidering | tomjed | Troubleshooting | 0 | 02-09-2006 03:50 AM |
Spidering problem please help | KaZ | Troubleshooting | 1 | 12-05-2005 07:59 AM |
Problem Spidering | Trallis | Troubleshooting | 6 | 11-02-2005 08:58 AM |
Problem Spidering | jmitchell | Troubleshooting | 3 | 12-29-2004 06:42 PM |