|
02-07-2005, 07:01 AM | #1 |
Green Mole
Join Date: Feb 2005
Posts: 16
|
Spider indexes cgi pages but not its links!?
Hi!
When I run the spider on a site www.domain.com that hosts several pages in the form of www.domain.com/cgi-bin/whatever... or cgi-bin.domain.com/whatever.... I can't find those links on the database nor the search results, but checking the most common keywords gives me as 1st place the cgi-bin.domain.com keyword. What's the deal? How do i make it to add the cgi-*.* links to the database for that particular domain? Also, is there any difference between indexing http://www.domain.com and http://domain.com? Will I get duplicate pages onto the db? Thanks! |
02-07-2005, 09:59 AM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
The links domain.com, www.domain.com, and sub.domain.com are considered different. Try setting PHPDIG_IN_DOMAIN to true in the config file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-08-2005, 02:18 PM | #3 |
Green Mole
Join Date: Feb 2005
Posts: 16
|
But then, why search results do not provide any cgi.domain.com result BUT will index it as keywords? how do i remove those false keywords and rerun the spider so it will pick cgi. as a url and not as a keyword?
|
02-08-2005, 06:04 PM | #4 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Stored keywords and indexed links are two different things. If the text cgi.domain.com appears in a page, it is stored as a keyword, regardless of whether the link cgi.domain.com is actually indexed. If you are using PhpDig v.1.8.7, and don't want the text cgi.domain.com stored as a keyword, then edit BANNED in the config file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Index some, but spider all pages | griemer | Troubleshooting | 0 | 01-16-2007 05:30 AM |
Spider Indexes an Error Page | alisan | Troubleshooting | 2 | 02-13-2006 11:14 AM |
Spider stops before all pages are indexed | halide | Troubleshooting | 3 | 07-19-2005 12:26 AM |
Using a dictionnary to spider pages | Edomondo | How-to Forum | 0 | 11-23-2004 07:36 AM |
not spidering all pages (too many links on page?) | mirdin | Troubleshooting | 2 | 09-01-2004 06:08 AM |