|
10-30-2005, 07:01 AM | #1 |
Green Mole
Join Date: Oct 2005
Posts: 3
|
Problem Spidering
I cannot index any sites with my install of phpDig. I have v1.8.8 RC1 on a windows box and apache. Directory permissions are already set correctly and I verified that allow_url_fopen is enabled.
I am trying to index: http://www.noland.com/noland/index.php When the spider starts, it seems to pull the parent directory www.noland.com (which is unavailable to the web as it redirects to www.noland.com/noland) When I try to spider an external site such as www.mtslink.com it will not work either. Here is the output that I get: Spidering in progress... [Stop spider] -------------------------------------------------------------------------------- SITE : http://www.noland.com/ Exclude paths : - Admin/ - auctiondata/ - calendar/ - cgi-local/ - enoland/ - itemmaint/ - mail/ - msds/ - nol****nline/ - nolandtest/ - obis/ - Orders/ - phpinc/ - squidalizer/ - Stylesheets/ - test/ - webmail/ - webalizer/ - squidalizer-detail/ Wait... 1:http://www.noland.com/noland/ (time : 00:00:05) No link in temporary table -------------------------------------------------------------------------------- links found : 1 http://www.noland.com/noland/ Optimizing tables... Indexing complete ! |
11-01-2005, 03:52 PM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Is your site and the PhpDig install on a server that uses load balancing?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
11-01-2005, 06:25 PM | #3 |
Green Mole
Join Date: Oct 2005
Posts: 3
|
No... I was able to get it to spider individual pages just fine by playing with the config, but it doesn't seem to want to follow any links no matter what I try.
|
11-01-2005, 06:40 PM | #4 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Try setting PHPDIG_IN_DOMAIN to true, LIMIT_TO_DIRECTORY to false, both in the config file, and then from the admin panel, use a large search depth, set links per to zero, and choose the no option. You can increase search depth beyond twenty by editing SPIDER_MAX_LIMIT in the config file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
11-01-2005, 06:46 PM | #5 |
Green Mole
Join Date: Oct 2005
Posts: 3
|
Done
Ok, I verified those 2 settings and I'm still able to get a single page indexed, but it will not follow any of the links. I'd be happy to provide you with the login information (via e-mail) if you think that would help to diagnose the problem.
Thanks for your help. John |
11-01-2005, 06:49 PM | #6 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Your install gives:
Code:
Spidering in progress... [Stop spider] SITE : http://www.mtslink.com/ Exclude paths : - @NONE@ Wait... 1:http://www.mtslink.com/ (time : 00:00:36) + + + + + + + + + + No link in temporary table links found : 1 http://www.mtslink.com/ Optimizing tables... Indexing complete ! [Back] to admin interface. Code:
Spidering in progress... [Stop spider] SITE : http://www.mtslink.com/ Exclude paths : - @NONE@ Wait... 1:http://www.mtslink.com/ (time : 00:00:12) + + + + + + + + + + level 1... Wait... 2:http://www.mtslink.com/pricing.php (time : 00:00:29) + + + + + + Wait... 3:http://www.mtslink.com/medicalintranet.php (time : 00:00:39) + Wait... 4:http://www.mtslink.com/contact.php (time : 00:00:47) Wait... 5:http://www.mtslink.com/ann.php (time : 00:00:56) + And so forth...
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
11-02-2005, 08:58 AM | #7 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Set your PHP display_errors to on and keep error_reporting(E_ALL); in the config file. With display_errors to off, error_reporting does not show anything onscreen. If you don't want to do this in PHP directly, try setting the following in an htaccess file in the main PhpDig directory and then do an index:
Code:
PHP_VALUE display_errors 1 Code:
SELECT VERSION();
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Spidering problem | mark40 | Troubleshooting | 1 | 08-28-2007 05:06 AM |
Problem with spidering | tomjed | Troubleshooting | 0 | 02-09-2006 03:50 AM |
Spidering problem please help | KaZ | Troubleshooting | 1 | 12-05-2005 07:59 AM |
Problem Spidering | jmitchell | Troubleshooting | 3 | 12-29-2004 06:42 PM |
spidering problem | nathansc | How-to Forum | 3 | 06-17-2004 04:25 PM |