|
02-19-2004, 06:20 PM | #1 |
Green Mole
Join Date: Feb 2004
Location: North Las Vegas, Nevada
Posts: 18
|
Can't get PHPDig to index an htaccess protected site
Hello,
I installed PHPdig. I am getting the admin/index.php page. I set the ../admin/temp directory, the ../includes directory and the ../text_content directory to chmod 777 (don't know if this is how it is supposed to be. I am not getting any database errors. So I go ahead and enter a URL to be spidered into the text box. For example: http://username:password@www.mydomain.com Nothing happens. It just hangs.. doesn't even go to the next page. However, if I write: http://www.mydomain.com It goes to the next page, but doesn't find any links on that page. Totally weird. When I go ahead and remove the htaccess username/password protection from that website and try it again with: http://www.mydomain.com It does find the links and seems to spider it correctly. So this is the first of my problems. The second one is completely different: I would like to split that admin directory out of the phpdig directory. I would like to stick that admin directory into an existing admin directory on an https server and rename that phpdig admin directory to "search_tools" or something like that. So, I want to access the phpdig admin directory with: https://secure.mydomain.com/admin/se...ools/index.php and the regular site search form with: http://www.mydomain.com/search.php Is that possible? Or would I have to change pretty much every single link in the admin directory and so forth? Please advise. Thank you. Great software, and the price is right. Better than the commercial license for Atomz for $15K per year !!!! Hope that I can get it working for us. Mr. L Last edited by mlerch@mac.com; 02-19-2004 at 06:52 PM. |
02-20-2004, 09:54 AM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
>> I set the ../admin/temp directory, the ../includes directory and the ../text_content directory to chmod 777 (don't know if this is how it is supposed to be.
Hi. Yes, those are the correct directories to set to 777 permissions. >> So I go ahead and enter a URL to be spidered into the text box. For example: http://username:password@www.mydomain.com Nothing happens. It just hangs.. doesn't even go to the next page. Are you able to access http://username:password@www.mydomain.com from the browser window without using PhpDig? What OS/setup are you using? >> However, if I write: http://www.mydomain.com It goes to the next page, but doesn't find any links on that page. Totally weird. As the directory is username/password protected, PhpDig doesn't have access so it doesn't find any links. >> When I go ahead and remove the htaccess username/password protection from that website and try it again with: http://www.mydomain.com It does find the links and seems to spider it correctly. Without the username/password protection, PhpDig has access and can find links. >> I would like to split that admin directory out of the phpdig directory... Try installing everything in the secure search_tools directory and then move the search.php file where wanted and make the following edits:
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-20-2004, 11:07 AM | #3 |
Green Mole
Join Date: Feb 2004
Location: North Las Vegas, Nevada
Posts: 18
|
Hi Charter,
Thanks for your pointers. Here is what I did. I opened a new browser and typed: http://username:password@www.mydomain.com Worked like a charm. Tried it again in PhpDig and it hung itself. My Server configuration is as follows: Mac OS X 10.2.8 iTools Apache 2 PHP -latest (safe_mode disabled) MySQL - latest I have not tried your instructions regarding moving the admin portion on a secure https server and leaving the search.php outside. I'll check it out this weekend. Also, I need to integrate the search box in a very simplified version (just a text field and a button) into the site template system, and have an "Advanced Search" link that will go to it's own search.php (or better advanced_search.php) page. Also I need to have all of the results come up customized on the "search_results.php" page on my site (don't want it to pop in a _blank page.) Are there any instructions on how to customize PhpDig that way. Please let me know, and thank you so much for your help. It's a wonderful tool. Sincerely, Mr. L |
02-20-2004, 06:21 PM | #4 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
>> ...PhpDig and it hung itself...
Hi. How long before it hangs? What happens if you wait like say ten minutes or so? Does it still seem to hang? >> ...any instructions on how to customize PhpDig... Most of your customization questions have been answered in one way or another somewhere in the forums.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-20-2004, 06:53 PM | #5 |
Green Mole
Join Date: Feb 2004
Location: North Las Vegas, Nevada
Posts: 18
|
Hi Charter,
Thanks for your answer. I will look for the customization stuff in the forum. Not a problem. Regarding the spidering of the htaccess protected site. I created a demo user and password for the spidering. This user can access the site perfectly when accessing it through a browser. When I let it run in PhpDig it won't even go to the spider.php page... the browser says.... "... loading page" but that's it. It just hangs there. I let it run for like 20-30 minutes, but the spider.php page never loaded. Don't know if this helps in any way. It's really strange. Mr. L |
02-20-2004, 06:56 PM | #6 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. It is strange. Perhaps it's an OS/setup issue? Not sure. What happens if you go to the admin panel, click the site, click the update button, set the username and password there, and then try a reindex?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-20-2004, 07:16 PM | #7 |
Green Mole
Join Date: Feb 2004
Location: North Las Vegas, Nevada
Posts: 18
|
Tried that already. It simply doesn't want to do it. I also tried a different htaccess protected site on my server. Same deal. I even tried different username:password combinations that I created. They all work in the web browser, but they don't work in the PhpDig spider.
Mr. L |
02-20-2004, 07:23 PM | #8 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Basically the username:password combo is split by parse_url so I'm wondering if there is something that is making the parse_url username:password combo not match what is in the .htaccess file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-20-2004, 09:09 PM | #9 |
Green Mole
Join Date: Feb 2004
Location: North Las Vegas, Nevada
Posts: 18
|
Hello Charter,
My server is not using an .htaccess file. It's done with the user/pass database authentication method. (not MySQL though.) I forget the name of it, but all the username:password combos are stored in a database. I am sorry if I sound ignorant.. I just can't think of the name of that database file right now Maybe that's the problem. But then again, I read some of your other answers to other .htaccess related posts, and I think I will go ahead and turn .htaccess off, spider it, then turn it back on. If this works all is well. Thank you so much for all your help. Mr. L |
02-20-2004, 10:25 PM | #10 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
>> ...server is not using an .htaccess file. It's done with the user/pass database authentication method...
Hi. PhpDig no mods cannot, and AFAIK no published mods exist to, validate against a username/password DB or cookie/browser authentication method. I assumed from your thread title that you were trying to index a .htaccess protected site.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-21-2004, 07:33 AM | #11 |
Green Mole
Join Date: Feb 2004
Location: North Las Vegas, Nevada
Posts: 18
|
Ok... good that we have figured that one out. Still there is a problem as I have just found out. I really don't know what I am supposed to expect when spidering a site. What exactly should happen when I click the Dig This ! button. Is it supposed to hang there at the page indicating that a new page (spider.php) is loading, or is it supposed to load spider.php and basically print the spidering process back into the page, entry by entry, by entry? For me I click the button and it just hangs. I am going to let it run now for an hour or two and see if it is going to pop over to the spider.php page and start the indexing process. Is there any way to change the code so that I can watch the progress of the indexing? Please advise. Thank you.
Mr. L Maybe it has nothing to do with the htaccess protection at all, but it has something to do with the vHost itself? Could that be it? Have there been any other reports that the PhpDig form pages simply "hangs" and does not proceed to the spider.php page after pressing the Dig This ! button? Please advise. |
02-21-2004, 08:46 AM | #12 |
Green Mole
Join Date: Feb 2004
Location: North Las Vegas, Nevada
Posts: 18
|
Charter... do you think that this line of code on top of all pages of the site that is not indexing is causing PhpDig to croak?
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> I checked all other sites and they I think don't have it on there. Mr. L It's not indexing the site. I really don't know anymore what to do. The other sites are indexing perfectly, but this one isn't. When I try to test index another site on my server it works. The spider.php page loads and I can see the progress. Just when I do the site that formerly was htaccess protected it doesn't work. It stalls. I removed the access controls and all. So I can access it now without a username and password. |
02-21-2004, 12:20 PM | #13 |
Green Mole
Join Date: Feb 2004
Location: North Las Vegas, Nevada
Posts: 18
|
Hi Charter,
So I did some more detailed looking into the problem. Here is what I found. when spidering the URL that doesn't work (stalls): I have traced it to: In robot_functions.php 1. function phpdigDetectDir in this function it parses the URL in to the variable $test, then it goes through an if { then } else { then } statment. In my case it it takes the ...else path because apparently the $test['query'] is set. Since it is taking the else { then } path. In the very first line robot_functions.php tries to define following variable: $status = phpdigTestUrl($link['url'].$link['path'].$link['file'],'date',$cookies); This is where it seems to stall, so I checked into this function. 2. function phpdigTestUrl it runs all the way through the "while" routine end it ends up where: $status = "NOFILE"; at the very end of that function $mode does not seem to be 'date', so it is supposed to: return $status; I guess that is where it hangs. Here are some details about the URL/website that I am trying to spider: http://www.mydomain.com/index.php index.php actually has in the very beginning a piece of script that checks if there is a variable string appended to index.php, and if it is formatted correctly. If the script finds out that there is a formatting problem, or that there is no variable string at the end of .../index.php then it will grab the correct string and do a redirect to an URL like this: http://www.mydomain.com/index.php?na...,1,1,1,1,1,0,0 Essentially when you were to go and type in the URL http://www.mydomain.com, or http://www.mydomain.com/index.php it will redirect you to: http://www.mydomain.com/index.php?na...,1,1,1,1,1,0,0 Do you think that this is causing the problem? Please advise. Oh yes, I actually tried to enter the URL into the PhpDig interface just like it would redirect it, but it still hangs with a NOFILE status. Oh yes, why is $path always /robots.txt I don't really understand it enough I guess. Thank you very much, Mr. L Last edited by mlerch@mac.com; 02-21-2004 at 12:38 PM. |
02-21-2004, 12:27 PM | #14 | |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Quote:
|
|
02-21-2004, 06:42 PM | #15 |
Green Mole
Join Date: Feb 2004
Location: North Las Vegas, Nevada
Posts: 18
|
Ok... something very interesting happened. I let it run and run and run and finally I got this:
Spidering in progress... Warning: fsockopen(): php_network_getaddresses: getaddrinfo failed: No address associated with nodename (is your IPV6 configuration correct? If this error happens all the time, try reconfiguring PHP using --disable-ipv6 option to configure) in /Library/.../admin/robot_functions.php on line 337 Warning: fsockopen(): unable to connect to www.mydomain.com:80 in /Library/.../admin/robot_functions.php on line 337 SITE : http://www.mydomain.com/ Exclude paths : - @NONE@ No link in temporary table links found : 0 ...Was recently indexed Optimizing tables... Indexing complete ! Line 337 is following: // this is part of function phpdigTestUrl($url,$mode='simple',$cookies=array()) { if (isset($req1) && $req1) { //close, and open a new connection //on the new location fclose($fp); $fp = fsockopen($host,$port); // this is line 337 Any Idea what this is supposed to mean? As I mentioned before I am dealing with a script that checks and redirects if necessary with the correct string appended to the URL. See prior post. Mr. L Seems like the .htaccess is not the problem afterall. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Spider Indexing and htaccess directories | webmaster_k | Troubleshooting | 0 | 10-01-2007 11:50 AM |
cannot index my site | ENTHALPIE | Troubleshooting | 2 | 11-18-2005 03:02 AM |
successful indexing of every site but site where phpdig is served | phillystyle123 | Troubleshooting | 1 | 02-21-2005 10:06 PM |
How do I create "Site Index" using PHPDig ? | jimfletcher | How-to Forum | 5 | 07-14-2004 05:56 AM |
htaccess | Tanasja | How-to Forum | 4 | 10-11-2003 07:29 AM |