|
06-09-2004, 05:58 AM | #1 |
Green Mole
Join Date: Jun 2004
Posts: 2
|
links found : 0 w/ example
My apologies for starting a new thread...I know this topic has been covered multiple times over. However, being that I'm giving a real world example I thought it best to isolate this discussion.
Now…for the problem… I’m trying to index a public web site my company is affiliated w/. I’m primarily interested in ONLY indexing the http://www.ophsource.org/periodicals/ophtha portion of the site. The site does use robots.txt however, the section I’m interested in indexing is NOT disallowed. The home page (http://www.ophsource.org ) also includes links to /periodicals/ophtha. I’ve tried setting the 'LIMIT_DAYS' to 0, index depth to 10, and emptying the database (all suggestions in other threads). However, I consistently get "Links found 0". My question is two fold: 1) Can anyone tell my why I can’t index the site and/or help me find a workaround? 2) Can anyone tell me how to ONLY index the /periodicals/ophtha sub directory of the site? SITE : http://www.ophsource.org/ Exclude paths : - article/ - medline/ - search/ - user/ - claim/ - ecommerce/ - retrieve/ - webfiles/ Starting to index web pages... No link in temporary table links found : 0 ...Was recently indexed Optimizing tables... Indexing complete ! |
06-19-2004, 06:33 AM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. For one, uncomment //print $answer."<br>\n"; in robot_functions.php and then index and see what's onscreen. For two, PhpDig currently spiders all links allowed, but after the spider is done, you can exclude certain directories from further index in the admin panel.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
06-19-2004, 11:50 AM | #3 |
Green Mole
Join Date: Jun 2004
Posts: 2
|
Thanks for the response! I tried what you suggested and still can not index the site. This is what I saw on the indexing page...
Server: Microsoft-IIS/5.0 Date: Sat, 19 Jun 2004 18:48:46 GMT Content-Type: text/plain Accept-Ranges: bytes Last-Modified: Tue, 04 Nov 2003 14:33:30 GMT ETag: "0d9c79de0a2c31:816" Content-Length: 179 HTTP/1.1 200 OK Server: Microsoft-IIS/5.0 Date: Sat, 19 Jun 2004 18:48:48 GMT Content-Type: text/plain Accept-Ranges: bytes Last-Modified: Tue, 04 Nov 2003 14:33:30 GMT ETag: "0d9c79de0a2c31:819" Content-Length: 179 -------------------------------------------------------------------------------- SITE : http://www.ophsource.org/ Exclude paths : - article/ - medline/ - search/ - user/ - claim/ - ecommerce/ - retrieve/ - webfiles/ No link in temporary table -------------------------------------------------------------------------------- links found : 0 ...Was recently indexed Optimizing tables... |
06-21-2004, 06:00 AM | #4 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Check that HEAD requests are allowed on the server for the site, as a HEAD request send to the site gives the following:
> telnet www.ophsource.org 80 Trying 129.35.xx.xxx... Connected to www.ophsource.org. Escape character is '^]'. HEAD / HTTP/1.0 Connection closed by foreign host. > Also check that allow_url_fopen is On in the php.ini file or that allow_url_fopen is On in the PHP info.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
No links found... | pwoc | Troubleshooting | 0 | 11-10-2004 09:05 PM |
No links found | antoonvdr | Troubleshooting | 0 | 10-10-2004 07:19 PM |
Another: links found : 1 | majestique | Bug Tracker | 11 | 07-12-2004 01:19 AM |
0 links found, yes, another one | juzzi | Troubleshooting | 5 | 07-05-2004 08:31 AM |
Links found: 1 | CafeenMan | Troubleshooting | 10 | 05-12-2004 09:35 PM |