PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 02-19-2004, 06:20 PM   #1
mlerch@mac.com
Green Mole
 
Join Date: Feb 2004
Location: North Las Vegas, Nevada
Posts: 18
Can't get PHPDig to index an htaccess protected site

Hello,

I installed PHPdig. I am getting the admin/index.php page. I set the ../admin/temp directory, the ../includes directory and the ../text_content directory to chmod 777 (don't know if this is how it is supposed to be.

I am not getting any database errors.

So I go ahead and enter a URL to be spidered into the text box. For example:

http://username:password@www.mydomain.com

Nothing happens. It just hangs.. doesn't even go to the next page.

However, if I write:
http://www.mydomain.com

It goes to the next page, but doesn't find any links on that page. Totally weird.

When I go ahead and remove the htaccess username/password protection from that website and try it again with:
http://www.mydomain.com

It does find the links and seems to spider it correctly.

So this is the first of my problems.

The second one is completely different:

I would like to split that admin directory out of the phpdig directory. I would like to stick that admin directory into an existing admin directory on an https server and rename that phpdig admin directory to "search_tools" or something like that. So, I want to access the phpdig admin directory with:

https://secure.mydomain.com/admin/se...ools/index.php

and the regular site search form with:

http://www.mydomain.com/search.php

Is that possible? Or would I have to change pretty much every single link in the admin directory and so forth?

Please advise.

Thank you. Great software, and the price is right. Better than the commercial license for Atomz for $15K per year !!!! Hope that I can get it working for us.

Mr. L

Last edited by mlerch@mac.com; 02-19-2004 at 06:52 PM.
mlerch@mac.com is offline   Reply With Quote
Old 02-20-2004, 09:54 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
>> I set the ../admin/temp directory, the ../includes directory and the ../text_content directory to chmod 777 (don't know if this is how it is supposed to be.

Hi. Yes, those are the correct directories to set to 777 permissions.

>> So I go ahead and enter a URL to be spidered into the text box. For example: http://username:password@www.mydomain.com
Nothing happens. It just hangs.. doesn't even go to the next page.

Are you able to access http://username:password@www.mydomain.com from the browser window without using PhpDig? What OS/setup are you using?

>> However, if I write: http://www.mydomain.com
It goes to the next page, but doesn't find any links on that page. Totally weird.

As the directory is username/password protected, PhpDig doesn't have access so it doesn't find any links.

>> When I go ahead and remove the htaccess username/password protection from that website and try it again with: http://www.mydomain.com
It does find the links and seems to spider it correctly.

Without the username/password protection, PhpDig has access and can find links.

>> I would like to split that admin directory out of the phpdig directory...

Try installing everything in the secure search_tools directory and then move the search.php file where wanted and make the following edits:
  • In search.php edit $relative_script_path = '.'; to reflect the directory of the PhpDig install, something like $relative_script_path = '../search_tools'; or $relative_script_path = './secure/admin/search_tools'; depending on your setup.
  • In config.php edit the first line of code (the code checking the $relative_script_path variable) so that it contains && ($relative_script_path != "fill_in") where fill_in matches what $relative_script_path = '.'; gets set to in the search.php file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-20-2004, 11:07 AM   #3
mlerch@mac.com
Green Mole
 
Join Date: Feb 2004
Location: North Las Vegas, Nevada
Posts: 18
Hi Charter,

Thanks for your pointers. Here is what I did. I opened a new browser and typed:

http://username:password@www.mydomain.com

Worked like a charm. Tried it again in PhpDig and it hung itself.

My Server configuration is as follows:

Mac OS X 10.2.8
iTools Apache 2
PHP -latest (safe_mode disabled)
MySQL - latest

I have not tried your instructions regarding moving the admin portion on a secure https server and leaving the search.php outside. I'll check it out this weekend.

Also, I need to integrate the search box in a very simplified version (just a text field and a button) into the site template system, and have an "Advanced Search" link that will go to it's own search.php (or better advanced_search.php) page. Also I need to have all of the results come up customized on the "search_results.php" page on my site (don't want it to pop in a _blank page.)

Are there any instructions on how to customize PhpDig that way. Please let me know, and thank you so much for your help. It's a wonderful tool.

Sincerely,

Mr. L
mlerch@mac.com is offline   Reply With Quote
Old 02-20-2004, 06:21 PM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
>> ...PhpDig and it hung itself...

Hi. How long before it hangs? What happens if you wait like say ten minutes or so? Does it still seem to hang?

>> ...any instructions on how to customize PhpDig...

Most of your customization questions have been answered in one way or another somewhere in the forums.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-20-2004, 06:53 PM   #5
mlerch@mac.com
Green Mole
 
Join Date: Feb 2004
Location: North Las Vegas, Nevada
Posts: 18
Hi Charter,

Thanks for your answer. I will look for the customization stuff in the forum. Not a problem.

Regarding the spidering of the htaccess protected site. I created a demo user and password for the spidering. This user can access the site perfectly when accessing it through a browser. When I let it run in PhpDig it won't even go to the spider.php page... the browser says.... "... loading page" but that's it. It just hangs there. I let it run for like 20-30 minutes, but the spider.php page never loaded.

Don't know if this helps in any way. It's really strange.

Mr. L
mlerch@mac.com is offline   Reply With Quote
Old 02-20-2004, 06:56 PM   #6
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. It is strange. Perhaps it's an OS/setup issue? Not sure. What happens if you go to the admin panel, click the site, click the update button, set the username and password there, and then try a reindex?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-20-2004, 07:16 PM   #7
mlerch@mac.com
Green Mole
 
Join Date: Feb 2004
Location: North Las Vegas, Nevada
Posts: 18
Tried that already. It simply doesn't want to do it. I also tried a different htaccess protected site on my server. Same deal. I even tried different username:password combinations that I created. They all work in the web browser, but they don't work in the PhpDig spider.

Mr. L
mlerch@mac.com is offline   Reply With Quote
Old 02-20-2004, 07:23 PM   #8
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Basically the username:password combo is split by parse_url so I'm wondering if there is something that is making the parse_url username:password combo not match what is in the .htaccess file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-20-2004, 09:09 PM   #9
mlerch@mac.com
Green Mole
 
Join Date: Feb 2004
Location: North Las Vegas, Nevada
Posts: 18
Hello Charter,

My server is not using an .htaccess file. It's done with the user/pass database authentication method. (not MySQL though.) I forget the name of it, but all the username:password combos are stored in a database.

I am sorry if I sound ignorant.. I just can't think of the name of that database file right now

Maybe that's the problem. But then again, I read some of your other answers to other .htaccess related posts, and I think I will go ahead and turn .htaccess off, spider it, then turn it back on. If this works all is well.

Thank you so much for all your help.

Mr. L
mlerch@mac.com is offline   Reply With Quote
Old 02-20-2004, 10:25 PM   #10
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
>> ...server is not using an .htaccess file. It's done with the user/pass database authentication method...

Hi. PhpDig no mods cannot, and AFAIK no published mods exist to, validate against a username/password DB or cookie/browser authentication method. I assumed from your thread title that you were trying to index a .htaccess protected site.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-21-2004, 07:33 AM   #11
mlerch@mac.com
Green Mole
 
Join Date: Feb 2004
Location: North Las Vegas, Nevada
Posts: 18
Ok... good that we have figured that one out. Still there is a problem as I have just found out. I really don't know what I am supposed to expect when spidering a site. What exactly should happen when I click the Dig This ! button. Is it supposed to hang there at the page indicating that a new page (spider.php) is loading, or is it supposed to load spider.php and basically print the spidering process back into the page, entry by entry, by entry? For me I click the button and it just hangs. I am going to let it run now for an hour or two and see if it is going to pop over to the spider.php page and start the indexing process. Is there any way to change the code so that I can watch the progress of the indexing? Please advise. Thank you.

Mr. L

Maybe it has nothing to do with the htaccess protection at all, but it has something to do with the vHost itself? Could that be it? Have there been any other reports that the PhpDig form pages simply "hangs" and does not proceed to the spider.php page after pressing the Dig This ! button? Please advise.
mlerch@mac.com is offline   Reply With Quote
Old 02-21-2004, 08:46 AM   #12
mlerch@mac.com
Green Mole
 
Join Date: Feb 2004
Location: North Las Vegas, Nevada
Posts: 18
Charter... do you think that this line of code on top of all pages of the site that is not indexing is causing PhpDig to croak?

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

I checked all other sites and they I think don't have it on there.

Mr. L

It's not indexing the site. I really don't know anymore what to do. The other sites are indexing perfectly, but this one isn't. When I try to test index another site on my server it works. The spider.php page loads and I can see the progress. Just when I do the site that formerly was htaccess protected it doesn't work. It stalls. I removed the access controls and all. So I can access it now without a username and password.
mlerch@mac.com is offline   Reply With Quote
Old 02-21-2004, 12:20 PM   #13
mlerch@mac.com
Green Mole
 
Join Date: Feb 2004
Location: North Las Vegas, Nevada
Posts: 18
Hi Charter,

So I did some more detailed looking into the problem. Here is what I found.

when spidering the URL that doesn't work (stalls):


I have traced it to:

In robot_functions.php

1. function phpdigDetectDir

in this function it parses the URL in to the variable $test, then it goes through an if { then } else { then } statment. In my case it it takes the ...else path because apparently the $test['query'] is set.

Since it is taking the else { then } path. In the very first line robot_functions.php tries to define following variable:

$status = phpdigTestUrl($link['url'].$link['path'].$link['file'],'date',$cookies);

This is where it seems to stall, so I checked into this function.

2. function phpdigTestUrl

it runs all the way through the "while" routine end it ends up where:
$status = "NOFILE";

at the very end of that function $mode does not seem to be 'date', so it is supposed to:

return $status;

I guess that is where it hangs.


Here are some details about the URL/website that I am trying to spider:

http://www.mydomain.com/index.php

index.php actually has in the very beginning a piece of script that checks if there is a variable string appended to index.php, and if it is formatted correctly.

If the script finds out that there is a formatting problem, or that there is no variable string at the end of .../index.php then it will grab the correct string and do a redirect to an URL like this:

http://www.mydomain.com/index.php?na...,1,1,1,1,1,0,0

Essentially when you were to go and type in the URL http://www.mydomain.com, or http://www.mydomain.com/index.php it will redirect you to:

http://www.mydomain.com/index.php?na...,1,1,1,1,1,0,0

Do you think that this is causing the problem? Please advise.

Oh yes, I actually tried to enter the URL into the PhpDig interface just like it would redirect it, but it still hangs with a NOFILE status.

Oh yes, why is $path always /robots.txt
I don't really understand it enough I guess.

Thank you very much,

Mr. L

Last edited by mlerch@mac.com; 02-21-2004 at 12:38 PM.
mlerch@mac.com is offline   Reply With Quote
Old 02-21-2004, 12:27 PM   #14
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
Quote:
Originally posted by mlerch@mac.com
Have there been any other reports that the PhpDig form pages simply "hangs" and does not proceed to the spider.php page after pressing the Dig This ! button? Please advise.
I don't remember what I may have posted (and I'm too lazy to go look), but I was having some problems spidering my site that is on a Windows server. Mine was going to the spider.php page though. Perhaps what you're experiencing is a server related issue similar to mine? Just something you might want to explore.
vinyl-junkie is offline   Reply With Quote
Old 02-21-2004, 06:42 PM   #15
mlerch@mac.com
Green Mole
 
Join Date: Feb 2004
Location: North Las Vegas, Nevada
Posts: 18
Ok... something very interesting happened. I let it run and run and run and finally I got this:

Spidering in progress...

Warning: fsockopen(): php_network_getaddresses: getaddrinfo failed: No address associated with nodename (is your IPV6 configuration correct? If this error happens all the time, try reconfiguring PHP using --disable-ipv6 option to configure) in /Library/.../admin/robot_functions.php on line 337

Warning: fsockopen(): unable to connect to www.mydomain.com:80 in /Library/.../admin/robot_functions.php on line 337
SITE : http://www.mydomain.com/
Exclude paths :
- @NONE@
No link in temporary table
links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !


Line 337 is following:
// this is part of function phpdigTestUrl($url,$mode='simple',$cookies=array()) {

if (isset($req1) && $req1) {
//close, and open a new connection
//on the new location
fclose($fp);
$fp = fsockopen($host,$port); // this is line 337


Any Idea what this is supposed to mean? As I mentioned before I am dealing with a script that checks and redirects if necessary with the correct string appended to the URL. See prior post.

Mr. L

Seems like the .htaccess is not the problem afterall.
mlerch@mac.com is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Spider Indexing and htaccess directories webmaster_k Troubleshooting 0 10-01-2007 11:50 AM
cannot index my site ENTHALPIE Troubleshooting 2 11-18-2005 03:02 AM
successful indexing of every site but site where phpdig is served phillystyle123 Troubleshooting 1 02-21-2005 10:06 PM
How do I create "Site Index" using PHPDig ? jimfletcher How-to Forum 5 07-14-2004 05:56 AM
htaccess Tanasja How-to Forum 4 10-11-2003 07:29 AM


All times are GMT -8. The time now is 12:24 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.