PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 09-30-2004, 09:41 PM   #1
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
Shell Spidering Quits After Indexing A Few Pages

I'm spidering from a shell command for the first time and having some problems. The process ran for around 5 minutes yesterday, and indexed about 50 pages, then said it was complete. I launched the shell command again, phpdig indexed a few more pages, then quit again. Same thing on a third try.

I have 1,500+ pages on my site, which indexes just fine when I run it from the secure web page, so why won't it do the same when I run it from a shell command?
vinyl-junkie is offline   Reply With Quote
Old 09-30-2004, 10:58 PM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
What are these set to in the config file?
PHP Code:
define('SPIDER_MAX_LIMIT',20);          //max recurse levels in spider
define('RESPIDER_LIMIT',5);             //recurse respider limit for update
define('LINKS_MAX_LIMIT',20);           //max links per each level
define('RELINKS_LIMIT',5);              //recurse links limit for an update 
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 10-01-2004, 05:25 AM   #3
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
Here's what I have:
Code:
define('SPIDER_MAX_LIMIT',40);          //max recurse levels in spider
define('RESPIDER_LIMIT',40);             //recurse respider limit for update
define('LINKS_MAX_LIMIT',30);           //max links per each level
define('RELINKS_LIMIT',40);              //recurse links limit for an update
vinyl-junkie is offline   Reply With Quote
Old 10-02-2004, 12:04 AM   #4
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
Here's a little more information, for whatever it's worth. I've just moved my website to a new host, and am trying to rebuild my phpdig search engine from scratch. The same performance issues are happening when I run phpdig as a secured web page as when I run via shell. Any idea what the problem could be?
vinyl-junkie is offline   Reply With Quote
Old 10-02-2004, 12:36 AM   #5
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Does it actually complete the index, or does it just stop after five minutes?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 10-02-2004, 06:25 AM   #6
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
It says indexing is complete, then stops.

I noticed when I ran phpdig one more time last night, the process ran to completion. Very strange. I sense this has been some sort of timeout issue. I wrote my web host to confirm, and will let you know if that was the problem after all.
vinyl-junkie is offline   Reply With Quote
Old 10-02-2004, 06:50 PM   #7
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
Well, here is my web host's reply:
Quote:
There haven't been any configuration changes to the server over the last couple days. I'm not familiar with the script you're using, but perhaps it relies on the domain name being fully resolved to function properly. What exactly does it return when it 'quits'? My suggestion would be to try again next time you need to re-index the site and let me know at the point if the same issues return. Other than that, any additional info (e.g. when your started the failed attempts) would be helpful in looking for any clues on the server side.
I tried phpdig several times before I got it to properly index my site, so now I'm really confused as to why it did work the one time and not the other dozen or so times. The problem in this sitation is that you guys know the software but not my server. My web host is the other way around.

Any suggestions as to where I should go from here with this?
vinyl-junkie is offline   Reply With Quote
Old 10-03-2004, 08:06 AM   #8
Wayne McBryde
Orange Mole
 
Join Date: Oct 2003
Location: NC, USA
Posts: 34
I don’t know if this will help but:

I have had the spidering stop when testing from my test server but never from my production server. The 2 servers are almost identical except the test server is in my house on a DSL line and the production server is co-located about 100 feet from where level 3 comes into Charlotte, NC. The test server has at most 2 websites that have VERY little traffic and no e-mail running. The production server has over 100 websites with lots of e-mail and traffic (But the server load is light). It looks to me like my problem is related to the slower internet connection, not the server. I would expect a server that is overloaded (or at least has a heavy load) could have the same problem.
__________________
Wayne Mcbryde
http://LakeNormansWeb.com
We search all of Lake Norman!
Wayne McBryde is offline   Reply With Quote
Old 10-03-2004, 09:18 AM   #9
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
I have no idea what kind of server load there might be, don't know how one measures that. I've just moved my site to a new hosting company, and server response time in terms of page loads seems to be pretty quick. Don't know if that would be an indicator of server load necessarily.

I did discover one thing last night which might possibly be related to this issue. I hadn't been able to get my phpdig search page to work properly since the move. I'd enter a search term and click Go, but then I'd get the same page back. I went through my site to make sure there weren't any missing files of any kind, and after doing that, my search page worked properly. All this, too, when none of the files that I had to upload (or re-upload) was anything to do with phpdig searches.

Fixing all the little stuff that was wrong seems to also have fixed spidering from a secured web page, but I still had the spidering process just hang at a time of 9:53 into spidering around midnight last night. Go figure.

I guess I'll work with it a little more and see how it goes.
vinyl-junkie is offline   Reply With Quote
Old 10-04-2004, 11:01 AM   #10
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
>> I have no idea what kind of server load there might be, don't know how one measures that.

From the shell prompt, type top or uptime and hit return. You should see the load average with three numbers showing the average load over the last 1, 5, and 15 minutes.

>> It says indexing is complete, then stops.
>> I still had the spidering process just hang...

Sometimes it hangs and sometimes it completes? Does anything unusual show in your raw access or error logs?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 10-04-2004, 08:48 PM   #11
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
Quote:
Originally Posted by Charter
>> I have no idea what kind of server load there might be, don't know how one measures that.

From the shell prompt, type top or uptime and hit return. You should see the load average with three numbers showing the average load over the last 1, 5, and 15 minutes.
Thanks. I'll try that next time and let you know the results.

Quote:
>> It says indexing is complete, then stops.
>> I still had the spidering process just hang...

Sometimes it hangs and sometimes it completes? Does anything unusual show in your raw access or error logs?
When spidering completes, it doesn't index very many pages at all through shell. However, I've spidered now a couple of times through the secure web page since moving my site, and it functioned just as I would have expected.

This whole thing really bugs me, because I would eventually like to have this run as a cron job and dispense with the other two methods. However, I don't have a lot of confidence at this point that a cron job would do any different than shell. This is all so strange.

Nothing unusual at all in the server log. My provider is at a loss to explain why it would just hang, too.

I've been chewing up lots of bandwidth messing with this. Still have plenty to play with for now, but need to watch it.
vinyl-junkie is offline   Reply With Quote
Old 10-09-2004, 07:08 PM   #12
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
Just a follow-up on this thread. I created a test phpdig database so I could mess with this a little more without clobbering production. Tried to populate the test database initially from shell, and phpdig wouldn't index any pages at all. My saved spider log was totally empty, too. I checked my server log, and nothing shows up for phpdig at all there.

I went ahead and populated my test database using the secure web page, and although it took about 3.5 hours to spider, it still indexed everything I would have expected. So why doesn't it work from shell?

Just now, I tried to update the index via shell, and the same thing happened that did initially - nothing indexed, empty spider log, nothing in the server log.

When I type uptime from the shell, here's what I get:
up 18 days, 38 minutes, 2 users, load average: 0.11, 0.31, 0.33

Any suggestions as to where I go from here? I've hit a brick wall...
vinyl-junkie is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
shell indexing problem Patrick_2a Troubleshooting 1 11-06-2005 11:30 AM
Shell command no indexing noel Troubleshooting 3 10-27-2005 11:22 AM
Spidering from shell - returns immediately, with nothing ciaran@clissman Troubleshooting 1 06-17-2005 04:14 AM
404 error via shell... no pages indexed claudiomet Troubleshooting 2 09-01-2004 07:07 AM
Shell Spidering CrazyCanuck Troubleshooting 3 04-20-2004 10:56 AM


All times are GMT -8. The time now is 07:20 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.