PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 01-15-2005, 02:28 PM   #1
BulForce
Orange Mole
 
Join Date: Aug 2004
Location: none
Posts: 33
Exclamation XDuplicate of an existing document Not working!


I have found that there is some kind of error when the digger compare the urls(probably only somekind of urls) I cannot post more than few examples from the log file generated by the digger. However i will be very happy if someone help me asap.

--- -- -
+205:http://www.site.com/static/blackebonyteens/index.php?q=adultzone,1,1,2,0,0,0,0,0,0(time : 00:44:31)

+206:http://www.site.com/static/blondeparade/index.php?q=adultzone,1,1,1,0,0,0,0,0,0(time : 00:44:43)XDuplicate of an existing document

207:http://www.site.com/static/blondeparade/index.php?q=adultzone,1,2,1,0,0,0,0,0,0(time : 00:45:10)

+208:http://www.site.com/static/blondeparade/index.php?q=adultzone,1,1,2,0,0,0,0,0,0(time : 00:45:24)XDuplicate of an existing document
--- -- -

I know that this isnt a normal links(with all this wierd charachers after the php?) but the url compare also did not work for this type addresses

--- -- -
nttp://www.site.com/m/gloryholestation-001/index.html?t1/revs=adultzone
--- -- -

Thank for reading this post.

*nttp is actually http(i have change it only on this post)

Last edited by BulForce; 01-15-2005 at 02:29 PM. Reason: typemistake
BulForce is offline   Reply With Quote
Old 01-15-2005, 03:29 PM   #2
BulForce
Orange Mole
 
Join Date: Aug 2004
Location: none
Posts: 33
Sorry for my stupid post, i have just figured out that the page is compared not only by name but its content too. And the pages that i have indexed have same text content and in some cases even no text content(only pictures)

Moderators feel free to erase this post if you want.

Last edited by BulForce; 01-15-2005 at 03:30 PM. Reason: typemistake
BulForce is offline   Reply With Quote
Old 01-15-2005, 04:01 PM   #3
BulForce
Orange Mole
 
Join Date: Aug 2004
Location: none
Posts: 33
However it will be okay for me, if somehow this duplicate check can be turned off.

If somebody knows how to avoid this duplicate check please help me.

thank
BulForce is offline   Reply With Quote
Old 01-16-2005, 01:44 PM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
See this and think rand.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-16-2005, 04:59 PM   #5
BulForce
Orange Mole
 
Join Date: Aug 2004
Location: none
Posts: 33
Thanks for your support

I have edit one line in robot_functions.php

Line:1323 Was $md5 = md5($titre_resume.$page_desc['content'].$text[$max_chunk]).'_'.$tempfilesize;

And have make it look this way:

Line:1323 Now $md5 = md5(rand().$titre_resume.$page_desc['content'].$text[$max_chunk]).'_'.$tempfilesize; //moded line - Turn off duplicate chk


I have run a little test spidering and all goes fine, i hope that there will be no more problems.

Last edited by BulForce; 01-16-2005 at 05:01 PM. Reason: :)
BulForce is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Integrate Results into existing template tochiro How-to Forum 0 01-16-2006 08:21 AM
Config on windows box- Document contains no data jozzy Troubleshooting 3 08-22-2005 07:36 AM
How to display results on an existing page ? alexuslesours How-to Forum 1 12-08-2004 11:59 AM
Display Document Icon in Result-page Topaz Mod Submissions 0 11-16-2004 03:03 PM
Reindexing non-existing urls AllKnightAccess Troubleshooting 0 11-06-2004 12:31 PM


All times are GMT -8. The time now is 11:47 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.