|
04-26-2004, 01:10 PM | #1 |
Orange Mole
Join Date: Oct 2003
Location: hamburg, germany
Posts: 52
|
phpdig seems to guess some urls and spider it
hi!
my urls look like this: domain.com/dir1/dir2/something now phpdig spiders them fine, all right. but it also seems to "guess" new urls. i saw it spidering domain.com/dir1/dir2/ although that isn't linked anywhere. why is that and how can i stop this? |
04-27-2004, 12:29 PM | #2 |
Orange Mole
Join Date: Oct 2003
Location: hamburg, germany
Posts: 52
|
doesn't anyone have an idea about that? that gives me a stupid lot of duplicate urls, that really sucks.
is there any way that i can tell phpdig to only spider what it gets with a link without "guessing" urls? |
04-27-2004, 06:35 PM | #3 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Have you verified that the URLs that you think phpDig is "guessing" really don't exist? If so, perhaps you could post the specific URL that you are trying to spider and an excerpt from your spider log of one or two of these bogus URLs.
There's no absolute guarantee that someone will have an answer for you, but posting a little more information might help. Best wishes. |
04-28-2004, 04:59 AM | #4 |
Orange Mole
Join Date: Oct 2003
Location: hamburg, germany
Posts: 52
|
hi!
that's not what i wrote. these urls do exist, but they aren't linked anywhere. and yes, i'm sure about that. here's an example: http://www.fussball24.de/fussball/115/frauen -> original url linked on the site, spidered well, all right http://www.fussball24.de/fussball/115 -> url guessed by phpdig, does exist but is exactly the same like the one above. any ideas? |
04-28-2004, 05:28 AM | #5 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Do you have any rewrite rules in your .htaccess file that would translate the one URL into the other? Any kind of redirect from one to the other?
While it's true that the pages are identical, the URLs are not. phpDig does not compare pages to each other to see if they have the same content. It only looks for different URLs, makes sure there is no robots exclusion to obey, and indexes them. |
04-28-2004, 06:02 AM | #6 |
Orange Mole
Join Date: Oct 2003
Location: hamburg, germany
Posts: 52
|
no, there's no mod-rewrite, no redirections, but forcetype url-rewriting stuff.
and i just wonder where phpdig gets the url from! in my example the last one isn't linked anywhere, so it must have "guessed" it. does the spider take urls like domain.com/dir1/dir2, cut off the last dir and spider domain.com/dir1? it seems to me like that, but i don't like it. how can i stop it? |
04-28-2004, 06:03 PM | #7 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
I'm not familiar with using forcetype, never heard of it until you mentioned it, so I did a little research to familiarize myself with that. It's possible there is something in the way you're doing that which is causing the problem, I don't know.
Someone else may have a different opinion, but I don't personally see how phpDig could be guessing this URL. What I would do is take a hard look at the way the code is written that references this page and see if there is something in it that would cause this URL to appear two different ways. Also, and this is just a guess since I'm not familiar with the site, but I would try to analyze the spider log and see if I could trace just how you ended up with the same page twice in your index. I wish I could be of more help. Perhaps someone else will come along with another idea that might solve your problem. |
04-29-2004, 02:49 AM | #8 |
Orange Mole
Join Date: Oct 2003
Location: hamburg, germany
Posts: 52
|
unfortunately i'm not a real php-pro that's why i'm rather not gonna start looking at phpdig's source code too much.
still thank's for your efforts, pat and if anyone else has any ideas, give it to me! |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
I: PHPDIG can not index 2+ URLs.. ? | PL_90 | Script Installation | 0 | 10-22-2007 08:51 AM |
Restart spider and index urls in temptable | jerrywin5 | Troubleshooting | 1 | 04-06-2005 02:18 PM |
phpdig add some underscores to URLs | cjones | Troubleshooting | 10 | 12-13-2004 07:45 PM |
Admin approval for spider to index external URLs | jerrywin5 | Mod Requests | 0 | 03-29-2004 10:37 PM |
PhpDig crop the URLs at ( | gaam | Troubleshooting | 2 | 02-11-2004 05:32 AM |