Hi. It is the robot_functions.php file that determines whether a page is a duplicate, specifically the phpdigTestDouble function.
In this function, it is the following query that checks for a duplicate:
PHP Code:
$query_double = "SELECT spider_id FROM ".PHPDIG_DB_PREFIX."spider WHERE site_id='$site_id' AND md5 = '$md5'";
As you are crawling the same site/folder, it is the $md5 variable that is checking for duplicate results. The $md5 variable is as follows:
PHP Code:
$md5 = md5($titre_resume.$page_desc['content'].$text[$max_chunk]).'_'.$tempfilesize;
Briefly, the variables in the $md5 variable are as follows:
- $titre_resume // page title
- $page_desc['content'] // meta tag description
- $text[$max_chunk] // last chunk of page text
- $tempfilesize // temp file size
As the pages are creating different $md5 variables, they are not seen as duplicates.
Try using the PHPDIG_EXCLUDE_COMMENT and PHPDIG_INCLUDE_COMMENT values from the config file, each on their own line (with PHP use \n if necessary), and surround the portion of dynamic content in each page like so:
Code:
<!-- phpdigExclude -->
dynamic content, for
example, code for a
rotating banner
would go in here
<!-- phpdigInclude -->