OK, here's my take on what you're saying, and it's not based on my knowledge of the phpDig code itself. Rather, it's based on what I see in the spider logs. My assumptions may or may not be correct.
phpDig obeys robots.txt - that much we know - but it still has to visit the page to find out if there is a robots exclusion, assuming that it didn't find that already in the robots.txt file. If a page has some kind of problem, like the one I pointed out, that could cause phpDig to go into some kind of loop. Exactly how or why that happens, I wouldn't know.
I hope you understand where I'm coming from with this. What I'm saying is basically this: If phpDig has to visit a page, there better not be any errors in it. If there is, it could throw phpDig into a tailspin and cause it not to spider everything you think it should.
My suggestion would be to either fix the page or use the include/exclude comments in the page(s) that link to the problem document, so that phpDig will not attempt to spider it.
|