PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Bug Tracker

Reply
 
Thread Tools
Old 09-19-2003, 04:44 AM   #1
Rolandks
Purple Mole
 
Rolandks's Avatar
 
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
No indexing IIS 6 Win 2003 Server

I spend many time to find out what the problems are with the NEW IIS 6 at Windows 2003 Server.

PHPDIG donĀ“t indexing IIS 6 Websites at the moment.

I also try to index a IIS 6 Sites from a Linux-System - same result. (email me, I sent you the web-page to test it.)

Results of indexing:

### IIS 6 - Log file ####
#Fields: date time c-ip c-session cs(Referer) sc-Protocol sc-uri sc-status
2003-09-18 19:41:27 62.142.48.115 1033 217.160.xx.xx 80 HTTP/1.1 HEAD /robots.txt 400 - BadRequest
2003-09-18 19:41:27 62.141.48.115 1034 217.160.xx.xx 80 HTTP/1.1 HEAD // 400 - BadRequest
2003-09-18 19:41:27 62.141.48.115 1035 217.160.xx.xx 80 HTTP/1.1 HEAD / 400 - BadRequest
2003-09-18 19:41:27 62.141.48.115 1036 1217.160.xx.xx 80 HTTP/1.1 HEAD /robots.txt 400 - BadRequest
op=HEAD arg=http://www.my-domain.de/ result="400 Bad Request"

## Windows 2003 Monitoring ###
<-> Filter: http
----------------------------------
HTTP: HEAD Request from Client
HTTP: Request Method =HEAD
HTTP: Uniform Resource Identifier =//
HTTP: Protocol Version =HTTP/1.1
HTTP: Host =www.my-domain.de
HTTP: Accept = */*
HTTP: Accept-Charset = iso-8859-1
HTTP: Accept-Encoding =identity
HTTP: User-Agent =PhpDig/1.6.2 (PHP; MySql)
------
HTTP: Response to Client; HTTP/1.1; Status Code = 400 - Bad Request
HTTP: Protocol Version =HTTP/1.1
HTTP: Status Code = Bad Request
HTTP: Reason =Bad Request
HTTP: Content-Length =20
HTTP: Content-Type =text/html
HTTP: Connection =close

I will also ask in a Win-Newsgroups to get the reasons for this.

I read some other problems with Error 400: does phpdig use allowed HTTP RFC Commands: see: RFC 2616

-Roland-
Rolandks is offline   Reply With Quote
Old 09-19-2003, 10:47 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. With HEAD [your_site]/robots.txt HTTP/1.1 it produces the following:

Content-Length: 24

The robots.txt file contains the following:
Code:
User-agent: *
Disallow:
What happens if you just delete the robots.txt file?

What do you get?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 09-19-2003, 11:13 AM   #3
Rolandks
Purple Mole
 
Rolandks's Avatar
 
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
ok is deleted. You can try again. Its just the same in my tests.

-Roland-
Rolandks is offline   Reply With Quote
Old 09-19-2003, 11:22 AM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Please can you post the results like you did above? Maybe there will be something in there, or are the results just like those above?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 09-19-2003, 01:24 PM   #5
Rolandks
Purple Mole
 
Rolandks's Avatar
 
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
Hmm, Monitor-Log is only possible if i start this 2 sec before i dig.

This is wrong - IMHO !!
robot_functions.php Line 286

Code:
  $request =
  "HEAD $path HTTP/1.1\n"
  ."Host: $host$sport\n"
  .$cookiesSendString
  .$auth_string
  ."Accept: */*\n"
  ."Accept-Charset: ".Dig-Spider_ENCODING."\n"
  ."Accept-Encoding: identity\n"
  ."User-Agent: Dig-Spider/".Dig-Spider_VERSION." (PHP; MySql)\n\n";
The Header(lines) of the HEAD Requests are NOT split by CRLF only
with LF ('\n')? LF is wrong in RFC - Each header ends with a CRLF !!

See:
http://www.w3.org/Protocols/rfc2616/...c2.html#sec2.2

Quote:
HTTP/1.1 defines the sequence CR LF as the end-of-line marker for all
protocol elements except the entity-body (see appendix 19.3 for
tolerant applications). The end-of-line marker within an entity-body
is defined by its associated media type, as described in section 3.7.

CRLF = CR LF
-Roland-

Last edited by Rolandks; 09-19-2003 at 01:58 PM.
Rolandks is offline   Reply With Quote
Old 09-19-2003, 01:43 PM   #6
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. I believe the problem is that the script uses \n and your machine needs \r\n.

Please try this to fix the problem: First make a backup of the robot_functions.php file. Then in robot_functions.php, do the following:
  1. find:

    PHP Code:
    $auth_string 'Authorization: Basic '.base64_encode($components['user'].':'.$components['pass'])."\n"
    and replace with:

    PHP Code:
    $auth_string 'Authorization: Basic '.base64_encode($components['user'].':'.$components['pass'])."\r\n"
  2. find:

    PHP Code:
    $cookiesSendString .= "Cookie: ".$cookieString['string']."\n"
    and replace with:

    PHP Code:
    $cookiesSendString .= "Cookie: ".$cookieString['string']."\r\n"
  3. find:

    PHP Code:
    @ini_set('user_agent','PhpDig/'.PHPDIG_VERSION.' (PHP; MySql)'."\n".phpDigMakeCookies($cookiesToSend,$path)); 
    and replace with:

    PHP Code:
    @ini_set('user_agent','PhpDig/'.PHPDIG_VERSION.' (PHP; MySql)'."\r\n".phpDigMakeCookies($cookiesToSend,$path)); 
  4. find:

    PHP Code:
      $request =
      
    "HEAD $path HTTP/1.1\n"
      
    ."Host: $host$sport\n"
      
    .$cookiesSendString
      
    .$auth_string
      
    ."Accept: */*\n"
      
    ."Accept-Charset: ".PHPDIG_ENCODING."\n"
      
    ."Accept-Encoding: identity\n"
      
    ."User-Agent: PhpDig/".PHPDIG_VERSION." (PHP; MySql)\n\n"
    and replace with:

    PHP Code:
      $request =
      
    "HEAD $path HTTP/1.1\r\n"
      
    ."Host: $host$sport\r\n"
      
    .$cookiesSendString
      
    .$auth_string
      
    ."Accept: */*\r\n"
      
    ."Accept-Charset: ".PHPDIG_ENCODING."\r\n"
      
    ."Accept-Encoding: identity\r\n"
      
    ."User-Agent: PhpDig/".PHPDIG_VERSION." (PHP; MySql)\r\n\r\n"
  5. find:

    PHP Code:
    $req1 "HEAD $path HTTP/1.1\n"
    ."Host: $host$sport\n"
    .$cookiesSendString
    .$auth_string
    ."Accept: */*\n"
    ."Accept-Charset: ".PHPDIG_ENCODING."\n"
    ."Accept-Encoding: identity\n"
    ."User-Agent: PhpDig/".PHPDIG_VERSION." (PHP; MySql)\n\n"
    and replace with:

    PHP Code:
    $req1 "HEAD $path HTTP/1.1\r\n"
    ."Host: $host$sport\r\n"
    .$cookiesSendString
    .$auth_string
    ."Accept: */*\r\n"
    ."Accept-Charset: ".PHPDIG_ENCODING."\r\n"
    ."Accept-Encoding: identity\r\n"
    ."User-Agent: PhpDig/".PHPDIG_VERSION." (PHP; MySql)\r\n\r\n"

I think that's all of them that absolutely need to be changed. I also think you could just do a search and replace, changing all \n to \r\n in the files.

As a general rule of thumb, I believe it's like this for different OS:

Windows uses \r\n
Macintosh uses \r
*nix uses \n
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 09-19-2003, 01:47 PM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Quote:
Originally posted by Rolandks
Hmm, Monitor-Log is only possible if i start this 2 sec before i dig.

Question? Where is the relevant Line in Spider.php ?

Are the Header(lines) of the HEAD Requests split be CRLF or only
with LF ('\n')? LF is wrong in RFC - Each header ends with a CRLF !!

-Roland-
Ah, I see you were already thinking that. To test, I wrote a script to do a HEAD request on your machine. With only \n I received 400 Bad Request, but with \r\n it worked fine.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 09-19-2003, 02:40 PM   #8
Rolandks
Purple Mole
 
Rolandks's Avatar
 
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
Thanks

I think it should change in the next Version it is conform to RFC - and if users update they can fix this again

I wrote above:

See:
http://www.w3.org/Protocols/rfc2616/...c2.html#sec2.2

Quote:
HTTP/1.1 defines the sequence CR LF as the end-of-line marker for all
protocol elements except the entity-body (see appendix 19.3 for
tolerant applications). The end-of-line marker within an entity-body
is defined by its associated media type, as described in section 3.7.
CRLF= CR LF
Microsoft IIS 6 is designed for NEW Security and they use STRICT RFC and no tolerant applications.

-Roland-
Rolandks is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
ISAPI or CGI with IIS on win 2003 server shaders Troubleshooting 0 11-15-2004 03:29 AM
auto re-indexing on shared hosting server mental cube How-to Forum 1 09-07-2004 05:10 PM
Indexing problems - IIS on XP darrenm Script Installation 1 05-07-2004 04:30 AM
installing on IIS Server.... ronyotz Script Installation 3 03-03-2004 07:22 PM
Spider cron Job with WIN in V1.8 Rolandks Troubleshooting 4 02-09-2004 01:08 AM


All times are GMT -8. The time now is 03:37 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.