PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 02-10-2005, 02:23 PM   #1
hendrix
Green Mole
 
Join Date: Feb 2005
Posts: 10
404 error although page exists

Hi,

I've had a problem indexing a particular site (please note that all other sites have been indexed without any problem).

PhpDig v1.8.7 is located at http://www.santeestrie.qc.ca/recherche

I've tried to index http://www.iugs.ca but it always returned a 404 error. So then I tried indexing a file I knew existed (http://www.iugs.ca/FR/100/RH_Recrutement.asp) but it also returned a 404 error:
------------------------------------------------
HTTP/1.1 404 Object Not Found - http://www.iugs.ca/FR/100/RH_Recrutement.asp
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.
Optimizing tables...
Indexation terminée !
------------------------------------------------

It doesn't matter which page I try to index on this site, it will never work. There's no robot.txt so that's not the problem.

Here are a few of my settings:

- Tried indexing with a depth of 10 and links per set to zero.

define('PHPDIG_IN_DOMAIN',true);
define('SPIDER_MAX_LIMIT',20);
define('RESPIDER_LIMIT',5);
define('LINKS_MAX_LIMIT',20);
define('RELINKS_LIMIT',5);
define('LIMIT_TO_DIRECTORY',false);
define('LIMIT_DAYS',0);

and from phpinfo():

allow_url_fopen = 1
safe_mode = off

Any help would be appreciated

Regards,
Stéphane Brault
eComDEV.com

Last edited by hendrix; 02-10-2005 at 02:26 PM.
hendrix is offline   Reply With Quote
Old 02-10-2005, 03:28 PM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
What if you try http://www.iugs.ca/FR/100/default.asp in the textbox?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-11-2005, 05:56 AM   #3
hendrix
Green Mole
 
Join Date: Feb 2005
Posts: 10
yes, I've also tried that...
hendrix is offline   Reply With Quote
Old 02-11-2005, 05:57 AM   #4
hendrix
Green Mole
 
Join Date: Feb 2005
Posts: 10
in fact, I've tried to index all the links found at http://www.iugs.ca/FR

and tried to add "default.asp" at the end also.
hendrix is offline   Reply With Quote
Old 02-11-2005, 06:46 AM   #5
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
The only other thing I can think of is that maybe the site dislikes HEAD requests so it returns a 404 Not Found even though GET requests return content.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-11-2005, 10:39 AM   #6
hendrix
Green Mole
 
Join Date: Feb 2005
Posts: 10
Is there a way to generate a "HEAD" request manually so I can see the server's response? I could open a connexion to the webserver (telnet www.iugs.ca 80) and issue whatever command it takes.
hendrix is offline   Reply With Quote
Old 02-11-2005, 01:21 PM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Code:
telnet www.iugs.ca 80
HEAD /FR/100/default.asp HTTP/1.1
Host: www.iugs.ca
However, I no longer think it's a HEAD request issue, as a one-page index produced the following:

Spidering in progress... [Stop spider]
SITE : http://www.iugs.ca/
Exclude paths :
- @NONE@

Wait...
1:http://www.iugs.ca/FR/100/default.asp
(time : 00:00:19)
No link in temporary table
links found : 1
http://www.iugs.ca/FR/100/default.asp
Optimizing tables...
Indexing complete !

So, I don't really know why you'd be getting 404s.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-11-2005, 04:49 PM   #8
hendrix
Green Mole
 
Join Date: Feb 2005
Posts: 10
By the way, thank you for your time, it's really appreciated.

Here's what I get when I try to index the same page as you:

---------------------------------------------
HTTP/1.1 404 Object Not Found - http://www.iugs.ca/robots.txt
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

HTTP/1.1 404 Object Not Found - http://www.iugs.ca/FR/100/default.asp
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.
Optimizing tables...
Indexation terminée !
---------------------------------------------

I forgot to mention: I am using IIS on Win 2000 Server. Here's my phpinfo page if that is of any help for you:

http://www.santeestrie.qc.ca/phpinfo.php

I know IIS isn't the best web server but I don't have the choice. Of course, I prefer Apache on Linux over IIS but...
hendrix is offline   Reply With Quote
Old 02-11-2005, 04:50 PM   #9
hendrix
Green Mole
 
Join Date: Feb 2005
Posts: 10
by the way, how come your phpdig installation didn't crawl www.iugs.ca and stopped after the first page?

soryr for my english, I usually speak french
hendrix is offline   Reply With Quote
Old 02-11-2005, 04:55 PM   #10
hendrix
Green Mole
 
Join Date: Feb 2005
Posts: 10
I've made a small donation through 2Checkout, I think PhpDig is great (and IIS is crap). I have it installed over a few other sites (running Apache on Linux) and never had any problems... except on IIS, as with most php scripts out there.

I have phpdig over there:

http://www.emusicmag.com
http://www.soundfontdepot.com
http://www.mididepot.com

and should soon be there:

http://www.homemusician.net

keep up the good work
hendrix is offline   Reply With Quote
Old 02-11-2005, 05:16 PM   #11
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Thanks! I want to say that is almost smells like a FP issue, but I didn't see any FP reference in your phpinfo. If you do a manual HEAD request, does it give a clue? To index just one page, use zero, zero, no in the admin panel.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-14-2005, 09:52 AM   #12
hendrix
Green Mole
 
Join Date: Feb 2005
Posts: 10
I don't think it's a FP issue since PhpDig and www.iugs.ca are not hosted on the same server.
hendrix is offline   Reply With Quote
Old 02-14-2005, 08:07 PM   #13
hendrix
Green Mole
 
Join Date: Feb 2005
Posts: 10
404 error although page exists Reply to Thread: SOLVED!

Hey when I try to index using the server's ip address, it works!

Could it be my host that can't resolve www.iugs.ca's ip?
hendrix is offline   Reply With Quote
Old 02-15-2005, 06:58 AM   #14
hendrix
Green Mole
 
Join Date: Feb 2005
Posts: 10
Ha! I was right.

I called my provider this morning and www.iugs.ca used to be hosted on their server. They still had an entry in their "hosts" file pointing to the wrong IP. They removed it and it worked instantly.

So I guess that would be a good idea to try spider an IP instead of a full URL when trying to figure out a problem of this nature.
hendrix is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
get 404 page on install tajmahal Script Installation 1 01-19-2005 05:59 PM
install page error ms_peacefull Script Installation 2 09-30-2004 05:44 PM
404 error via shell... no pages indexed claudiomet Troubleshooting 2 09-01-2004 07:07 AM
Error 404 Continuous on re-index jacobwoods Troubleshooting 5 06-04-2004 04:19 PM
404 Errors & only 1 page indexed steve_true Troubleshooting 4 05-20-2004 06:10 AM


All times are GMT -8. The time now is 07:11 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.