PhpDig.net - View Single Post

vinyl-junkie · 04-25-2004, 08:41 AM

OK, here's my take on what you're saying, and it's not based on my knowledge of the phpDig code itself. Rather, it's based on what I see in the spider logs. My assumptions may or may not be correct.

phpDig obeys robots.txt - that much we know - but it still has to visit the page to find out if there is a robots exclusion, assuming that it didn't find that already in the robots.txt file. If a page has some kind of problem, like the one I pointed out, that could cause phpDig to go into some kind of loop. Exactly how or why that happens, I wouldn't know.

I hope you understand where I'm coming from with this. What I'm saying is basically this: If phpDig has to visit a page, there better not be any errors in it. If there is, it could throw phpDig into a tailspin and cause it not to spider everything you think it should.

My suggestion would be to either fix the page or use the include/exclude comments in the page(s) that link to the problem document, so that phpDig will not attempt to spider it.

04-25-2004, 08:41 AM	#10
vinyl-junkie Purple Mole Join Date: Jan 2004 Posts: 694	OK, here's my take on what you're saying, and it's not based on my knowledge of the phpDig code itself. Rather, it's based on what I see in the spider logs. My assumptions may or may not be correct. phpDig obeys robots.txt - that much we know - but it still has to visit the page to find out if there is a robots exclusion, assuming that it didn't find that already in the robots.txt file. If a page has some kind of problem, like the one I pointed out, that could cause phpDig to go into some kind of loop. Exactly how or why that happens, I wouldn't know. I hope you understand where I'm coming from with this. What I'm saying is basically this: If phpDig has to visit a page, there better not be any errors in it. If there is, it could throw phpDig into a tailspin and cause it not to spider everything you think it should. My suggestion would be to either fix the page or use the include/exclude comments in the page(s) that link to the problem document, so that phpDig will not attempt to spider it.