![]() |
problem indexing password-protected directories
Hi all,
I am not able to spider a directory protected by .htaccess. I have set up a test here: http://testt:testt@www.php-web-devel...hpdig/main.php ...but the script just shows: Code:
SITE : http://www.php-web-development.com/ I'm using 1.6.2 vanilla and have tried setting PHPDIG_DEFAULT_INDEX to both true and false I'd be grateful for any suggestions. TIA Alex |
Hi. Hmm. Perhaps instead of passing the username and password via the URL, it might work to look at the sites table in say phpMyAdmin, and for the protected site, add the username and password to that row of the table.
|
Hi Charter,
Thanks for your reply. I checked in phpMyAdmin and the user:pass combination had already been correctly parsed by the script and entered into the DB. Any other ideas? If you try to index: Code:
http://testt:testt@www.php-web-development.com/testphpdig/main.php Thanks again, Alex |
Hi. Perhaps this is related to the problem posted here. Can you try and set self.parent.location to the absolute URL instead of the relative URL and see if that works?
|
Hi Charter,
I think I've worked out the problem.... It's not related to relative META and JS links - the same "site" spiders fine without the .htaccess In fact, this is now spidering fine: http://testt:testt@www.php-web-development.com/testphpdig/main.php BUT this isn't: http://test%40domain.com:test@www.php-web-development.com/testphpdig/main.php and nor is this: http://test@domain.com:test@www.php-web-development.com/testphpdig/main.php The first version sends an escaped "%40" so gets "access denied" as the incorrect user. The second example parses as "domain.com" ....so is there no way of sending an @ sign as part of a username? Thanks for all your help... Alex |
Ooh, do tell what you did to get it to work. :)
I can understand the %40 not working, but the @ sounds like a regex issue. With http://test@domain.com:test@www.php-web-development.com/testphpdig/main.php is the username and password in the sites table now blank? |
Hi,
Quote:
Trying this in the spider box: http://test@domain.com:test@php-web-development.com/testphpdig/main.php ...attempts to spider http://domain.com/ :) phpMyadmin says this: 15 http://www.php-web-development.com/ 20031022121122 test%40domain.com test 0 0 16 http://domain.com/ 20031022132030 test 0 0 (so "test" is interpreted as the username, with the pw blank) Cheers, Alex |
Ah, okay, that'll help me track it down. I'll keep you posted.
|
I appreciate it...
You're welcome to use my test site to test with if you want. http://www.php-web-development.com/testphpdig/main.php the two valid user/pass combos are: testt:testt and test@domain.com:test The site is identical to the root domain, except for the .htaccess. Thanks again, Alex |
Hi,
I have a problem with protected sites too. Maybe, I don't understand the tipps above (my english is not the best) phpdig 1.6.4 says: Warning: file( http://...@www.vdoh.de/robots.txt): failed to open stream: No such file or directory in /is/htdocs/xyz/www.vdoh.de/inc/search/admin/robot_functions.php on line 553 Warning: Variable passed to each() is not an array or object in /is/htdocs/30981/www.vdoh.de/inc/search/admin/robot_functions.php on line 554 SITE : http://www.vdoh.de/ Exclude paths : - @NONE@ (time : 00:00:00) No link in temporary table links found : 0 ...Was recently indexed Optimizing tables... Indexing complete ! [Back] to admin interface. The URL to index is like: http://username:password@www.vdoh.de/index.php username and password are in the .htaccess |
Hi. What do the .htaccess and .htpasswd files look like?
The .htaccess file should have something in it like so: Code:
AuthUserFile /full/path/to/.htpasswd Code:
Username:a1b2c3d4e5f6g |
Quote:
Okay, what can be the problem? |
Hi. What HTML source output do you get when you run the following script?
PHP Code:
Code:
User-agent:* |
Yes, this was a test today.
The really content of robots.txt is: User-agent: * Disallow: |
Hi. Please run the following. It may help me determine the problem.
PHP Code:
Code:
User-agent: * |
All times are GMT -8. The time now is 01:08 PM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.