PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 10-05-2003, 04:25 PM   #1
jalerta
Green Mole
 
Join Date: Oct 2003
Posts: 6
Some sites won't index

Hi All,

I have installed PHPDig-1.6.2 on a Redhat Linux 8.1 server running Apache 2.0 and MySQL version 3.23.56 with PHP 4.2.2.

I am having problems with some sites not indexing and just giving me the following message.

SITE : http://www.somedomain.com/
Exclude paths :
- @NONE@
No link in temporary table

--------------------------------------------------------------------------------

links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !


I am sure that there are more than 10 links on the index.html page of this site, but still nothing.

On other domains on this server PHPDig works correctly.

Can anyone give me any idea as to what is happening?

Thanks in advance.


Jeff
jalerta is offline   Reply With Quote
Old 10-05-2003, 04:41 PM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Did you previously index the sites recently, or are the sites like http://www.domain.com/dirone/index.php and http://www.domain.com/dirtwo/index.php? You can change the reindex timeframe with define('LIMIT_DAYS',7); in the config file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 10-05-2003, 07:36 PM   #3
jalerta
Green Mole
 
Join Date: Oct 2003
Posts: 6
Thanks for the reply.

I have been trying to get it to work with that specific domain and have read other posts here about problems with reindexing a recently indexed site.

So, I have repeatedly deleted the MySQL database and re-installed it using the install.php script.

I am only indexing from the top level directory using "www.domainname1.com" and "www.domainname2.com.

I have also tried "www.domainname.com/index.html" without any success.

I have tried indexing 3 domains on the same server. Only one indexed. The other 2, including the domain that I really what to index, did not.

Both domains gave the same message listed in the post above.


Jeff
jalerta is offline   Reply With Quote
Old 10-05-2003, 07:49 PM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. To start over and index from scratch, do the following:
  1. empty all the PhpDig database tables
  2. delete all files that may be in the temp dir
  3. delete all files in the text_content dir except keepalive.txt
  4. run spider.php from a browser or command prompt
Before running spider.php from the command prompt, in the config file, change the following to one like so, if only one level is wanted:
PHP Code:
define('SPIDER_MAX_LIMIT',1);
define('SPIDER_DEFAULT_LIMIT',1);
define('RESPIDER_LIMIT',1); 
Also, in the config file, change the following to one like so, if more frequent reindexing is wanted:
PHP Code:
define('LIMIT_DAYS',1); 
Emptying the database tables is part of the process to restart from scratch. The files in the text_content directory also need to be deleted, except for the keepalive.txt file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 10-05-2003, 09:04 PM   #5
jalerta
Green Mole
 
Join Date: Oct 2003
Posts: 6
I followed your instructions but still nothing.

The message this time was:

2935: old priority 0, new priority 18
Spidering in progress...
-----------------------------
SITE : http://www.somedomain.com/
Exclude paths :
- @NONE@
No link in temporary table
links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !

Just to recap the installation instructions so I am sure that I got everything right ...

I unTARed the phpdig files into a temp directory and then copied all the files into the www.somedomain.com/search directory.

I changed the permissions on the admin/temp, includes and text_content directories to 777 to allow write access to everyone. ( Security issue that I will worry about when I get PHPDig running )

I copied the _connect.php file to connect.php and edited it to add the MySQL hostname, username, password and database name. I cleared the PHPDIG_DB_PREFIX field.

I then ran the install.php file from a web browser ( although at first it complained about not finding the init_db.sql file, which I then copied to the admin directory).

Once the database was created and the tables were installed I tried to index www.somedomain.com with on success.

Was there anything else that I was supposed to do? Am I missing any permissions or something?

Any other suggestions?


Thanks for the help.


Jeff
jalerta is offline   Reply With Quote
Old 10-06-2003, 03:53 PM   #6
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. That sounds correct. What type of files are you trying to index: *.asp, *.shtml, etcetera? Do you notice if indexing works on some file types but not others?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 10-06-2003, 04:58 PM   #7
jalerta
Green Mole
 
Join Date: Oct 2003
Posts: 6
I am trying to index plain .html files.

I have done some more tests and I have tried to index 10 different virtual domain sites that reside on my server.

I have discovered that of the 10 sites I tried to index only 1 site worked. 9 sites would not index.

Looking furthur, I discovered that the only site that would index was a site that had moved to another provider.

The directory structure and files for the web site still resided on my server but the DNS now points to another server.

All the other virtual domains that I tried to index had DNS entries that pointed to my server IP address.

Does this tell you anything?

Jeff
jalerta is offline   Reply With Quote
Old 10-06-2003, 05:36 PM   #8
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Can you try lynx from command line instead? An example is in this thread.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 10-06-2003, 07:41 PM   #9
jalerta
Green Mole
 
Join Date: Oct 2003
Posts: 6
I tried using Lynx, with no success.

Lynx would just sit there saying "Making HTTP connection to www.somedomain.com".

I was wondering if the issue in this case could be that the web server is behind a NAT'ed firewall?

Also, the web sites are on the same machine as the DNS service.

So, on the internal network the server has an IP address, for example, of 10.1.1.100. However, in the DNS the domain has an IP address of 123.123.123.1.

In this case, Lynx is trying to open the web site that DNS says is at 123.123.123.1, while the server that the web site is really on is at 10.1.1.100. So no connection can be established.

Is this a possible explaination for the problem?

Has anyone run into this problem before?

Any and all help is greatly appreciated.

Thanks,

Jeff
jalerta is offline   Reply With Quote
Old 10-08-2003, 11:04 AM   #10
rayvd
Green Mole
 
Join Date: Oct 2003
Location: Mesa, AZ
Posts: 15
This is definitely a NAT problem. I am experiencing the same thing and am trying to figure out a rule to get around it. What I'm going to try and figure out how to do is to get the webserver to reply on the same interface as the request came in on, instead of doing NAT on the packet.

If your setup isn't too complex, you may just be able to set up a rule specifying that outbound packets to a given IP should not be NAT'd, or in some specific way only. I am hoping to find a way to tell the system to not do NAT on packets with a certain flag marked ... I'm using ipf on FreeBSD, but I would guess iptables would have this functionality as well...
rayvd is offline   Reply With Quote
Old 10-08-2003, 11:31 AM   #11
rayvd
Green Mole
 
Join Date: Oct 2003
Location: Mesa, AZ
Posts: 15
Well, fixed my problem by adjusting the routing table on the machine with the webserver.

In your case, why not add an explicity entry to your /etc/hosts file pointing to the internal address instead of the external one?
rayvd is offline   Reply With Quote
Old 10-08-2003, 11:46 AM   #12
jalerta
Green Mole
 
Join Date: Oct 2003
Posts: 6
Rayvd,

Yep, that worked.

Thanks for the help.

I hope the PHPDig will eventually have the ability to directly index a site based on the location of files in the file system instead of only by FQDN/IP address.

Again, thanks for the help.


Jeff
jalerta is offline   Reply With Quote
Old 10-12-2003, 09:29 PM   #13
vvvvv
Green Mole
 
Join Date: Oct 2003
Posts: 6
I have the same problem:

SITE : http://www.blah-blah-blah.com/
Exclude paths :
- @NONE@
No link in temporary table


>Well, fixed my problem by adjusting the routing table on the machine with the webserver.

I can't do that cause I have a simple hosting account.
Any suggestions? And thanks in advance for any help.
vvvvv is offline   Reply With Quote
Old 10-13-2003, 06:01 PM   #14
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Perhaps in config.php change PHPDIG_DEFAULT_INDEX to false?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 10-13-2003, 06:17 PM   #15
vvvvv
Green Mole
 
Join Date: Oct 2003
Posts: 6
thanks Charter but still the same:

--------------------------------------------------------------------------------
SITE : http://www.somesite.com/
Exclude paths :
- @NONE@
No link in temporary table

--------------------------------------------------------------------------------

links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !

--------------------------------------------------------------------------------

Any other ideas? Much appreciate the help.
vvvvv is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Trying to index some dynamic sites guillaume Troubleshooting 2 08-08-2007 06:40 AM
PHPDig won't index most sites and only go down one level on all confusion Troubleshooting 1 10-14-2005 11:32 AM
I just want to index main sites afesh How-to Forum 1 08-26-2005 09:45 PM
"I don't want to index your sites!!!" - said PHPDig #ASH How-to Forum 1 04-06-2005 02:57 PM
index intershop-sites? comko Troubleshooting 4 03-30-2004 09:22 AM


All times are GMT -8. The time now is 09:07 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.