PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 04-26-2004, 01:10 PM   #1
manute
Orange Mole
 
manute's Avatar
 
Join Date: Oct 2003
Location: hamburg, germany
Posts: 52
Angry phpdig seems to guess some urls and spider it

hi!

my urls look like this: domain.com/dir1/dir2/something
now phpdig spiders them fine, all right. but it also seems to "guess" new urls. i saw it spidering domain.com/dir1/dir2/ although that isn't linked anywhere.
why is that and how can i stop this?
manute is offline   Reply With Quote
Old 04-27-2004, 12:29 PM   #2
manute
Orange Mole
 
manute's Avatar
 
Join Date: Oct 2003
Location: hamburg, germany
Posts: 52
doesn't anyone have an idea about that? that gives me a stupid lot of duplicate urls, that really sucks.
is there any way that i can tell phpdig to only spider what it gets with a link without "guessing" urls?
manute is offline   Reply With Quote
Old 04-27-2004, 06:35 PM   #3
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
Have you verified that the URLs that you think phpDig is "guessing" really don't exist? If so, perhaps you could post the specific URL that you are trying to spider and an excerpt from your spider log of one or two of these bogus URLs.

There's no absolute guarantee that someone will have an answer for you, but posting a little more information might help.

Best wishes.
vinyl-junkie is offline   Reply With Quote
Old 04-28-2004, 04:59 AM   #4
manute
Orange Mole
 
manute's Avatar
 
Join Date: Oct 2003
Location: hamburg, germany
Posts: 52
hi!

that's not what i wrote. these urls do exist, but they aren't linked anywhere. and yes, i'm sure about that.
here's an example:

http://www.fussball24.de/fussball/115/frauen -> original url linked on the site, spidered well, all right

http://www.fussball24.de/fussball/115 -> url guessed by phpdig, does exist but is exactly the same like the one above.

any ideas?
manute is offline   Reply With Quote
Old 04-28-2004, 05:28 AM   #5
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
Do you have any rewrite rules in your .htaccess file that would translate the one URL into the other? Any kind of redirect from one to the other?

While it's true that the pages are identical, the URLs are not. phpDig does not compare pages to each other to see if they have the same content. It only looks for different URLs, makes sure there is no robots exclusion to obey, and indexes them.
vinyl-junkie is offline   Reply With Quote
Old 04-28-2004, 06:02 AM   #6
manute
Orange Mole
 
manute's Avatar
 
Join Date: Oct 2003
Location: hamburg, germany
Posts: 52
no, there's no mod-rewrite, no redirections, but forcetype url-rewriting stuff.
and i just wonder where phpdig gets the url from! in my example the last one isn't linked anywhere, so it must have "guessed" it.
does the spider take urls like domain.com/dir1/dir2, cut off the last dir and spider domain.com/dir1?
it seems to me like that, but i don't like it. how can i stop it?
manute is offline   Reply With Quote
Old 04-28-2004, 06:03 PM   #7
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
I'm not familiar with using forcetype, never heard of it until you mentioned it, so I did a little research to familiarize myself with that. It's possible there is something in the way you're doing that which is causing the problem, I don't know.

Someone else may have a different opinion, but I don't personally see how phpDig could be guessing this URL. What I would do is take a hard look at the way the code is written that references this page and see if there is something in it that would cause this URL to appear two different ways.

Also, and this is just a guess since I'm not familiar with the site, but I would try to analyze the spider log and see if I could trace just how you ended up with the same page twice in your index.

I wish I could be of more help. Perhaps someone else will come along with another idea that might solve your problem.
vinyl-junkie is offline   Reply With Quote
Old 04-29-2004, 02:49 AM   #8
manute
Orange Mole
 
manute's Avatar
 
Join Date: Oct 2003
Location: hamburg, germany
Posts: 52
unfortunately i'm not a real php-pro that's why i'm rather not gonna start looking at phpdig's source code too much.
still thank's for your efforts, pat and if anyone else has any ideas, give it to me!
manute is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
I: PHPDIG can not index 2+ URLs.. ? PL_90 Script Installation 0 10-22-2007 08:51 AM
Restart spider and index urls in temptable jerrywin5 Troubleshooting 1 04-06-2005 02:18 PM
phpdig add some underscores to URLs cjones Troubleshooting 10 12-13-2004 07:45 PM
Admin approval for spider to index external URLs jerrywin5 Mod Requests 0 03-29-2004 10:37 PM
PhpDig crop the URLs at ( gaam Troubleshooting 2 02-11-2004 05:32 AM


All times are GMT -8. The time now is 01:01 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.