|
02-10-2005, 02:23 PM | #1 |
Green Mole
Join Date: Feb 2005
Posts: 10
|
404 error although page exists
Hi,
I've had a problem indexing a particular site (please note that all other sites have been indexed without any problem). PhpDig v1.8.7 is located at http://www.santeestrie.qc.ca/recherche I've tried to index http://www.iugs.ca but it always returned a 404 error. So then I tried indexing a file I knew existed (http://www.iugs.ca/FR/100/RH_Recrutement.asp) but it also returned a 404 error: ------------------------------------------------ HTTP/1.1 404 Object Not Found - http://www.iugs.ca/FR/100/RH_Recrutement.asp See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation. 404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it. Optimizing tables... Indexation terminée ! ------------------------------------------------ It doesn't matter which page I try to index on this site, it will never work. There's no robot.txt so that's not the problem. Here are a few of my settings: - Tried indexing with a depth of 10 and links per set to zero. define('PHPDIG_IN_DOMAIN',true); define('SPIDER_MAX_LIMIT',20); define('RESPIDER_LIMIT',5); define('LINKS_MAX_LIMIT',20); define('RELINKS_LIMIT',5); define('LIMIT_TO_DIRECTORY',false); define('LIMIT_DAYS',0); and from phpinfo(): allow_url_fopen = 1 safe_mode = off Any help would be appreciated Regards, Stéphane Brault eComDEV.com Last edited by hendrix; 02-10-2005 at 02:26 PM. |
02-10-2005, 03:28 PM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
What if you try http://www.iugs.ca/FR/100/default.asp in the textbox?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-11-2005, 05:56 AM | #3 |
Green Mole
Join Date: Feb 2005
Posts: 10
|
yes, I've also tried that...
|
02-11-2005, 05:57 AM | #4 |
Green Mole
Join Date: Feb 2005
Posts: 10
|
in fact, I've tried to index all the links found at http://www.iugs.ca/FR
and tried to add "default.asp" at the end also. |
02-11-2005, 06:46 AM | #5 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
The only other thing I can think of is that maybe the site dislikes HEAD requests so it returns a 404 Not Found even though GET requests return content.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-11-2005, 10:39 AM | #6 |
Green Mole
Join Date: Feb 2005
Posts: 10
|
Is there a way to generate a "HEAD" request manually so I can see the server's response? I could open a connexion to the webserver (telnet www.iugs.ca 80) and issue whatever command it takes.
|
02-11-2005, 01:21 PM | #7 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Code:
telnet www.iugs.ca 80 HEAD /FR/100/default.asp HTTP/1.1 Host: www.iugs.ca Spidering in progress... [Stop spider] SITE : http://www.iugs.ca/ Exclude paths : - @NONE@ Wait... 1:http://www.iugs.ca/FR/100/default.asp (time : 00:00:19) No link in temporary table links found : 1 http://www.iugs.ca/FR/100/default.asp Optimizing tables... Indexing complete ! So, I don't really know why you'd be getting 404s.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-11-2005, 04:49 PM | #8 |
Green Mole
Join Date: Feb 2005
Posts: 10
|
By the way, thank you for your time, it's really appreciated.
Here's what I get when I try to index the same page as you: --------------------------------------------- HTTP/1.1 404 Object Not Found - http://www.iugs.ca/robots.txt See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation. 404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it. HTTP/1.1 404 Object Not Found - http://www.iugs.ca/FR/100/default.asp See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation. 404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it. Optimizing tables... Indexation terminée ! --------------------------------------------- I forgot to mention: I am using IIS on Win 2000 Server. Here's my phpinfo page if that is of any help for you: http://www.santeestrie.qc.ca/phpinfo.php I know IIS isn't the best web server but I don't have the choice. Of course, I prefer Apache on Linux over IIS but... |
02-11-2005, 04:50 PM | #9 |
Green Mole
Join Date: Feb 2005
Posts: 10
|
by the way, how come your phpdig installation didn't crawl www.iugs.ca and stopped after the first page?
soryr for my english, I usually speak french |
02-11-2005, 04:55 PM | #10 |
Green Mole
Join Date: Feb 2005
Posts: 10
|
I've made a small donation through 2Checkout, I think PhpDig is great (and IIS is crap). I have it installed over a few other sites (running Apache on Linux) and never had any problems... except on IIS, as with most php scripts out there.
I have phpdig over there: http://www.emusicmag.com http://www.soundfontdepot.com http://www.mididepot.com and should soon be there: http://www.homemusician.net keep up the good work |
02-11-2005, 05:16 PM | #11 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Thanks! I want to say that is almost smells like a FP issue, but I didn't see any FP reference in your phpinfo. If you do a manual HEAD request, does it give a clue? To index just one page, use zero, zero, no in the admin panel.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-14-2005, 09:52 AM | #12 |
Green Mole
Join Date: Feb 2005
Posts: 10
|
I don't think it's a FP issue since PhpDig and www.iugs.ca are not hosted on the same server.
|
02-14-2005, 08:07 PM | #13 |
Green Mole
Join Date: Feb 2005
Posts: 10
|
404 error although page exists Reply to Thread: SOLVED!
Hey when I try to index using the server's ip address, it works!
Could it be my host that can't resolve www.iugs.ca's ip? |
02-15-2005, 06:58 AM | #14 |
Green Mole
Join Date: Feb 2005
Posts: 10
|
Ha! I was right.
I called my provider this morning and www.iugs.ca used to be hosted on their server. They still had an entry in their "hosts" file pointing to the wrong IP. They removed it and it worked instantly. So I guess that would be a good idea to try spider an IP instead of a full URL when trying to figure out a problem of this nature. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
get 404 page on install | tajmahal | Script Installation | 1 | 01-19-2005 05:59 PM |
install page error | ms_peacefull | Script Installation | 2 | 09-30-2004 05:44 PM |
404 error via shell... no pages indexed | claudiomet | Troubleshooting | 2 | 09-01-2004 07:07 AM |
Error 404 Continuous on re-index | jacobwoods | Troubleshooting | 5 | 06-04-2004 04:19 PM |
404 Errors & only 1 page indexed | steve_true | Troubleshooting | 4 | 05-20-2004 06:10 AM |