|
01-12-2005, 12:29 PM | #1 |
Green Mole
Join Date: Jan 2005
Posts: 9
|
PhpDig not identifying itself on every page access
I'm in the process of setting up PhpDig and it works quite well.
But, there is one minor problem. PhpDig is not identifying itself when it accesses every page. Here's a sample from my logs on a test site... 69.64.40.48 - - [12/Jan/2005:11:32:58 -0500] "HEAD /robots.txt HTTP/1.1" 200 - "-" "PhpDig/1.8.6 (+http://www.phpdig.net/robot.php)" 69.64.40.48 - - [12/Jan/2005:11:32:58 -0500] "GET /robots.txt HTTP/1.0" 200 321 "-" "PhpDig/1.8.6 (+http://www.phpdig.net/robot.php)" ... ... 69.64.40.48 - - [12/Jan/2005:11:33:00 -0500] "HEAD /contact_us.php HTTP/1.1" 200 - "-" "-" As you can see it does fine when requesting robots.txt, but when it requests an actual page it doesn't identify itself. |
01-12-2005, 01:08 PM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Check that the user agent you are using is set to not block referring information.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
01-13-2005, 02:46 AM | #3 |
Green Mole
Join Date: Jan 2005
Posts: 9
|
Nope, that's not the problem. Google, MSN, and others show up with no problem. As do browser user agents Mozilla, IE, Firefox, etc.
Interesting that the user agent shows up for the robots.txt query, but for the head and get queries for actual HTML pages it vanishes. Last edited by CBJim; 01-13-2005 at 02:48 AM. |
01-13-2005, 03:21 AM | #4 |
Green Mole
Join Date: Jan 2005
Posts: 9
|
Interesting addendum...
PhpDig identifies itself without a problem on the root "/" head request, then loses it on the root "/" get request. All head and get statements after the first root head request lose the user-agent. |
01-13-2005, 03:24 AM | #5 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
PhpDig passes its user-agent on every request and does nothing to block referrer, so I wouldn't think this issue is related to PhpDig. Note that, in the following line, not even the page size is given. Perhaps send an email to server4you and see if they have an idea.
Code:
69.64.40.48 - - [12/Jan/2005:11:33:00 -0500] "HEAD /contact_us.php HTTP/1.1" 200 - "-" "-"
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
01-13-2005, 04:03 AM | #6 |
Green Mole
Join Date: Jan 2005
Posts: 9
|
One last possibility/question:
This is the root IP on the server that is hosting the site being spidered. Could it be excluding the user-agent because of that? |
01-13-2005, 05:14 AM | #7 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Yes, the IP is from the machine running the spider, but I don't see why that would cause PhpDig to block out information.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
01-13-2005, 05:23 AM | #8 |
Green Mole
Join Date: Jan 2005
Posts: 9
|
I'm puzzled myself, but I have been experimenting a little...
If you remove.. .$cookiesSendString .$auth_string from function phpdigTestUrl, it's identity is revealed again for every inquiry. Change the HEAD to GET and the file size is correctly shown in the logs again. Which also allows PhpDig to be excluded in the spider killer in OSCommerce. Unfortunately, put the cookie info back in and you lose the file size and phpdig's user-agent again. And it's not phpdig that's blocking the info, I made it echo every request and the info is being sent, it's just not being shown by the server logs. If you could help me rest better, could you spider www.candlerock.com (search depth 1, links 0)? This way I can compare the log entries and see if they are different/correct from an external IP. Last edited by CBJim; 01-13-2005 at 05:32 AM. |
01-13-2005, 08:49 AM | #9 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Okay, but don't post my IP. Here's the admin log for 30 odd links before I stopped the spider. Check your access log a few minutes before the time of this post. Do you see correct UA and referrer info?
Spidering in progress... [Stop spider] SITE : http://www.candlerock.com/ Exclude paths : - _fpclass - _private - _themes - _vti_cnf - _vti_log - _vti_pvt - _vti_script - _vti_txt - download - wmail - CVS - cgi-bin - candles/admin 1:http://www.candlerock.com/ (time : 00:00:09) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + level 1... 2:http://www.candlerock.com/ordering.php (time : 00:00:34) 3:http://www.candlerock.com/privacy.php (time : 00:00:41) 4:http://www.candlerock.com/advanced_search.php (time : 00:00:48) 5:http://www.candlerock.com/contact_us.php (time : 00:00:54) Meta Robots = NoIndex, or already indexed : No content indexed 6:http://www.candlerock.com/shopping_cart.php (time : 00:01:01) 7:http://www.candlerock.com/candle_making_supplies.php?c=Candle_Making_Accessories&cPath=1 (time : 00:01:07) 8:http://www.candlerock.com/candle_making_supplies.php?c=Decorative_Candle_Making_Accessories&cPath=1_3 7 (time : 00:01:15) 9:http://www.candlerock.com/candle_making_supplies.php?c=Wax_Melting_Pots&cPath=1_39 (time : 00:01:22) 10:http://www.candlerock.com/candle_making_supplies.php?c=Miscellaneous_Candle_Making_Accessories&cPath= 1_42 (time : 00:01:30) 11:http://www.candlerock.com/candle_making_supplies.php?c=Candle_Mold_Accessories&cPath=1_38 (time : 00:01:37) 12:http://www.candlerock.com/candle_making_supplies.php?c=Scales_for_Candle_Making&cPath=1_40 (time : 00:01:44) 13:http://www.candlerock.com/candle_making_supplies.php?c=Wick_Tabs&cPath=1_36 (time : 00:01:51) 14:http://www.candlerock.com/candle_making_supplies.php?c=Candle_Making_Additives&cPath=2 (time : 00:01:58) 15:http://www.candlerock.com/candle_making_supplies.php?c=Candle_Coloring&cPath=21 (time : 00:02:05) 16:http://www.candlerock.com/candle_making_supplies.php?c=Color_Chips&cPath=21_28 (time : 00:02:12) 17:http://www.candlerock.com/candle_making_supplies.php?c=Liquid_Candle_Coloring&cPath=21_29 (time : 00:02:20) 18:http://www.candlerock.com/candle_making_supplies.php?c=Candle_Making_Kits&cPath=22 (time : 00:02:28) 19:http://www.candlerock.com/candle_making_supplies.php?c=Metal_Candle_Molds&cPath=31 (time : 00:02:35) 20:http://www.candlerock.com/candle_making_supplies.php?c=Miscellaneous_Metal_Candle_Molds&cPath=31_57 (time : 00:02:42) 21:http://www.candlerock.com/candle_making_supplies.php?c=Oval_Metal_Candle_Molds&cPath=31_55 (time : 00:02:50) 22:http://www.candlerock.com/candle_making_supplies.php?c=Pyramid_Metal_Candle_Molds&cPath=31_56 (time : 00:02:58) 23:http://www.candlerock.com/candle_making_supplies.php?c=Round_Metal_Candle_Molds&cPath=31_52 (time : 00:03:06) 24:http://www.candlerock.com/candle_making_supplies.php?c=Square_Metal_Candle_Molds&cPath=31_53 (time : 00:03:13) 25:http://www.candlerock.com/candle_making_supplies.php?c=Star_Metal_Candle_Molds&cPath=31_54 (time : 00:03:22) 26:http://www.candlerock.com/candle_making_supplies.php?c=2_Piece_Plastic_Candle_Molds&cPath=30 (time : 00:03:29) 27:http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Animal_Candle_Molds&cPath=30_44 (time : 00:03:36) 28:http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Christmas_Candle_Molds&cPath=30_49 (time : 00:03:44) 29:http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Column_and_Taper_Candle_Molds&cPath=30 _50 (time : 00:03:52) 30:http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Floating_Candle_Molds&cPath=30_43 (time : 00:03:59) 31:http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Food_and_Fruit_Candle_Molds&cPath=30_4 7 (time : 00:04:08) 32:http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Halloween_Candle_Molds&cPath=30_48 (time : 00:04:15) No link in temporary table links found : 31 http://www.candlerock.com/ http://www.candlerock.com/ordering.php http://www.candlerock.com/privacy.php http://www.candlerock.com/advanced_search.php http://www.candlerock.com/contact_us.php http://www.candlerock.com/candle_making_supplies.php?c=Candle_Making_Accessories&cPath=1 http://www.candlerock.com/candle_making_supplies.php?c=Decorative_Candle_Making_Accessories&cPath=1_3 7 http://www.candlerock.com/candle_making_supplies.php?c=Wax_Melting_Pots&cPath=1_39 http://www.candlerock.com/candle_making_supplies.php?c=Miscellaneous_Candle_Making_Accessories&cPath= 1_42 http://www.candlerock.com/candle_making_supplies.php?c=Candle_Mold_Accessories&cPath=1_38 http://www.candlerock.com/candle_making_supplies.php?c=Scales_for_Candle_Making&cPath=1_40 http://www.candlerock.com/candle_making_supplies.php?c=Wick_Tabs&cPath=1_36 http://www.candlerock.com/candle_making_supplies.php?c=Candle_Making_Additives&cPath=2 http://www.candlerock.com/candle_making_supplies.php?c=Candle_Coloring&cPath=21 http://www.candlerock.com/candle_making_supplies.php?c=Color_Chips&cPath=21_28 http://www.candlerock.com/candle_making_supplies.php?c=Liquid_Candle_Coloring&cPath=21_29 http://www.candlerock.com/candle_making_supplies.php?c=Candle_Making_Kits&cPath=22 http://www.candlerock.com/candle_making_supplies.php?c=Metal_Candle_Molds&cPath=31 http://www.candlerock.com/candle_making_supplies.php?c=Miscellaneous_Metal_Candle_Molds&cPath=31_57 http://www.candlerock.com/candle_making_supplies.php?c=Oval_Metal_Candle_Molds&cPath=31_55 http://www.candlerock.com/candle_making_supplies.php?c=Pyramid_Metal_Candle_Molds&cPath=31_56 http://www.candlerock.com/candle_making_supplies.php?c=Round_Metal_Candle_Molds&cPath=31_52 http://www.candlerock.com/candle_making_supplies.php?c=Square_Metal_Candle_Molds&cPath=31_53 http://www.candlerock.com/candle_making_supplies.php?c=Star_Metal_Candle_Molds&cPath=31_54 http://www.candlerock.com/candle_making_supplies.php?c=2_Piece_Plastic_Candle_Molds&cPath=30 http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Animal_Candle_Molds&cPath=30_44 http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Christmas_Candle_Molds&cPath=30_49 http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Column_and_Taper_Candle_Molds&cPath=30 _50 http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Floating_Candle_Molds&cPath=30_43 http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Food_and_Fruit_Candle_Molds&cPath=30_4 7 http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Halloween_Candle_Molds&cPath=30_48 Optimizing tables... Indexing complete ! [Back] to admin interface.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
01-13-2005, 10:53 AM | #10 |
Green Mole
Join Date: Jan 2005
Posts: 9
|
Well, nope...still the same situation. Here's a partial of the log...
xx.xx.xxx.15 - - [13/Jan/2005:12:37:33 -0500] "HEAD / HTTP/1.1" 200 - "-" "PhpDig/1.8.6 (+http://www.phpdig.net/robot.php)" xx.xx.xxx.15 - - [13/Jan/2005:12:37:33 -0500] "GET / HTTP/1.1" 200 19200 "-" "-" xx.xx.xxx.15 - - [13/Jan/2005:12:37:34 -0500] "HEAD /nstylesheet.css HTTP/1.1" 200 - "-" "-" xx.xx.xxx.15 - - [13/Jan/2005:12:37:35 -0500] "HEAD /ordering.php HTTP/1.1" 200 - "-" "-" xx.xx.xxx.15 - - [13/Jan/2005:12:37:35 -0500] "HEAD /privacy.php HTTP/1.1" 200 - "-" "-" xx.xx.xxx.15 - - [13/Jan/2005:12:37:35 -0500] "HEAD /advanced_search.php HTTP/1.1" 200 - "-" "-" xx.xx.xxx.15 - - [13/Jan/2005:12:37:35 -0500] "HEAD /contact_us.php HTTP/1.1" 200 - "-" "-" xx.xx.xxx.15 - - [13/Jan/2005:12:37:36 -0500] "HEAD /shopping_cart.php HTTP/1.1" 200 - "-" "-" Also, since you clicked the link before you spidered that site... nice to see that someone is using firefox. |
01-13-2005, 03:23 PM | #11 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
When I test on my server, the UA and referrer come through okay. Maybe there is some protocol issue and/or your server is not able to understand the headers. What type of OS/setup are you using? Anyway, if you don't need cookies or authentication sent with the requests, just remove $cookiesSendString and/or $auth_string from the two HEAD and one GET requests in the robot_functions.php file. It's not an ideal solution, but I can't figure out what's going on, especially since I can't reproduce it. BTW, I do like Firefox.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
01-14-2005, 02:20 AM | #12 |
Green Mole
Join Date: Jan 2005
Posts: 9
|
Linux Fedora - Core 2
Apache - 2.57 HTTPD - 2.0.51 I'll do some research on this, something is amiss. All other IDs are recognized. I like Foxfire myself, I just need to get out of the habit of clicking the IE icon. |
01-14-2005, 03:29 AM | #13 |
Green Mole
Join Date: Jan 2005
Posts: 9
|
Well, I tried spidering another site on the server and phpDig was recognized as the user-agent in the logs.
Being a little perplexed I wrote a quick script with apache_request_headers() and ran it on the site that hasn't been recognizing phpDig. There appears to be a "ghost" cookie being sent that isn't being picked up and echoed back to the server. It's a phpbb2mysql_data cookie. I'm not even sure why it's being set on that site since phpbb hasn't been on that site...ever. Now I don't even know where to start looking for that to fix it. User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Cookie: lang=english; phpbb2mysql_data=s%3A0%3A%22%22%3B; osCsid=%cookie deleted% The osCsid is being echoed back by phpDig, the phpbb2 is not. Last edited by CBJim; 01-14-2005 at 03:36 AM. |
01-14-2005, 08:34 AM | #14 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
WRT phpbb2mysql_data:
PHP Code:
Remove any phpbb2mysql_* type cookie. Still see the ghost cookie? PhpDig tip: add osCsid to PHPDIG_SESSID_VAR in config file to remove session from links when indexing.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
01-14-2005, 08:56 AM | #15 |
Green Mole
Join Date: Jan 2005
Posts: 9
|
LOL, never thought to remove the cookie from my system. Still doesn't explain how it got there though. But, that's a different problem.
So, that means I've run out of ideas for why the user-agent is showing. Thanks for the tip! |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
PHP has encountered an Access Violation at | csouza | Troubleshooting | 0 | 02-28-2008 03:15 PM |
Strange things on my apache log access. | dawn | Troubleshooting | 1 | 01-26-2005 07:29 AM |
access forbbiden | liquidice | Script Installation | 7 | 08-27-2004 03:12 PM |
As I can index archives with access restricted with password? | zertiko | How-to Forum | 7 | 07-24-2004 08:07 AM |
Write access to installation directories not required. | jirving | Mod Requests | 0 | 09-29-2003 11:01 AM |