PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 04-24-2004, 09:52 PM   #1
sbhikes
Green Mole
 
Join Date: Apr 2004
Posts: 2
Yet another indexing problem

I have searched and searched and can't find an answer.

I have lots of pages where I use the same page, such as 'index.php' and add query strings to it.

It seems to get stuck at not being able to tell that a query string is different from another.

For example, I have over 200 items in a database, so my page will be something like 'index.php?id=1', index.php"id=2' etc.

But phpdig has been able to get only
'index.php?display_table'
'index.php?display_list'
'index.php?display_thumbnails'
'index.php?id=86'

I can't get it to be able to see that id=1, id=2, id=3... are different pages. It's like it can only tell the difference if the query strings have different letters, not different numbers.

What can I do?

Oh, and there are no 404 problems or redirects or any of the other things in all the other posts I've looked into. All the links to all the ?id=n pages are all listed on the first page.
sbhikes is offline   Reply With Quote
Old 04-25-2004, 04:03 PM   #2
sbhikes
Green Mole
 
Join Date: Apr 2004
Posts: 2
I tried some more things, but no matter what I cannot index any pages with similar query string beyond those ones with ?display_all, ?display_table, ?display_list and only one with a longer query string, whichever one it gets to first.

Odd thing is that if the page is a .shtml page and not a .php page I can index everything.

Why is that? Is there anything I can do about that?
sbhikes is offline   Reply With Quote
Old 04-25-2004, 04:55 PM   #3
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
One possible way around this, assuming you're on Linux, is to rewrite your URLs so that your dynamic content appears to be static.

I have a whole lot of dynamic content on my own website like this. For example, instead of displaying album content like this:
Code:
www.napathon.net/TrackList.php?AlbumID=1530
I use my .htaccess file to rewrite this URL like so:
Code:
www.napathon.net/AlbumID1530.php
The rewrite code in my .htaccess file looks like this:
Code:
RewriteEngine On
RewriteRule ^AlbumID([0-9]+).php TrackList.php?AlbumID=$1 [L]
I hasten to add that I'm not the world's foremost expert on writing regular expressions, which is what this seemingly gibberish is, so I might not necessarily be able to help you write something for your application. However, perhaps someone else can help with that if you're interested in pursuing it as a solution to your problem.
vinyl-junkie is offline   Reply With Quote
Old 05-17-2004, 11:06 AM   #4
drywall
Green Mole
 
Join Date: May 2004
Posts: 25
I'm coming across the same problem as sbhikes -- it grabs page.php?id=1 but doesn't grab 2, 3, etc...

Vinyl-junkie's workaround sounds like it should work, but I'd prefer not to have to go in and find every GET reference like that and rewrite it into the phpdig-friendly version (only to have it get rewritten back again with Apache behind the scenes).

Seems like this is a genuine bug in phpdig's spidering process (that happens to have an Apache workaround). I don't suppose some kind soul familiar enough with the phpdig spidering code could to try to fix this for real?
drywall is offline   Reply With Quote
Old 05-17-2004, 11:20 AM   #5
drywall
Green Mole
 
Join Date: May 2004
Posts: 25
I'd like to expand on this problem a little bit, in case anyone feels like tackling it. I'm indexing a reasonably complicated site and I've noticed that in some cases it's managing to index dynamic pages with different numbers in their GET string, but not others.

I'm not sure about this, but it appears to only be able to grab one per page. For example, on http://www.freepress.net/news/releases.php, it will only spider the first release on the list (ID 17). However, it appear to be spidering several news article pages (which have urls of the form news/article.php?id=XXXX), because it's finding them via separate pages, rather than on a single page as with the press releases.

Or maybe it's dying simply because it stops looking at the releases once it hits the word doc? Not sure... but it's fishy, and frustrating.
drywall is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem with indexing Raghavendra Script Installation 3 09-25-2006 09:52 PM
Indexing problem... afrim_05 Troubleshooting 3 11-24-2005 04:51 AM
Indexing problem deshaye7 Troubleshooting 1 06-01-2005 07:57 AM
indexing problem outside localhost onlytrue Troubleshooting 2 03-18-2004 01:46 AM
indexing problem?? Chris2 Troubleshooting 2 02-21-2004 08:23 AM


All times are GMT -8. The time now is 12:35 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.