View Single Post
Old 01-20-2004, 09:43 AM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. It is the robot_functions.php file that determines whether a page is a duplicate, specifically the phpdigTestDouble function.

In this function, it is the following query that checks for a duplicate:
PHP Code:
$query_double "SELECT spider_id FROM ".PHPDIG_DB_PREFIX."spider WHERE site_id='$site_id' AND md5 = '$md5'"
As you are crawling the same site/folder, it is the $md5 variable that is checking for duplicate results. The $md5 variable is as follows:
PHP Code:
$md5 md5($titre_resume.$page_desc['content'].$text[$max_chunk]).'_'.$tempfilesize
Briefly, the variables in the $md5 variable are as follows:
  1. $titre_resume // page title
  2. $page_desc['content'] // meta tag description
  3. $text[$max_chunk] // last chunk of page text
  4. $tempfilesize // temp file size
As the pages are creating different $md5 variables, they are not seen as duplicates.

Try using the PHPDIG_EXCLUDE_COMMENT and PHPDIG_INCLUDE_COMMENT values from the config file, each on their own line (with PHP use \n if necessary), and surround the portion of dynamic content in each page like so:
Code:
<!-- phpdigExclude -->
dynamic content, for
example, code for a
rotating banner
would go in here
<!-- phpdigInclude -->
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote