PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 01-14-2005, 12:23 PM   #1
attriel
Green Mole
 
Join Date: Jan 2005
Posts: 3
indexing dynamic pages

So, I tossed phpdig onto my dev server, figure I'll see how it goes before worrying about how to hack it onto the deployment servers and their frankensteinien convolutions.

But I immediately run into a problem. I tell it to start indexing at:

http://site.name.here/

And it starts, and it finds the links on that page, (there are 28, I believe) all of the form:
http://site.name.here/view.php?id=13672

Unfortunately, it appears that I'm tossing the # out, and just going to
http://site.name.here/view.php?id=

Since there are roughly 15000 various IDs involved in different sections, indexing 3 pages is suboptimal :/ (index, and 2 variations on the url)

I thought it might be the PHPSESSID, but I flipped that off in the config and it continues stripping, so ... What variable do I want to tune to make it retain those #'s. B/c they're mildly important

All links are relative, but I don't imagine that should matter

Thanks

--attriel

(I can't give the link , since it's still a development server and not publicly available anywhere)
attriel is offline   Reply With Quote
Old 01-14-2005, 02:21 PM   #2
attriel
Green Mole
 
Join Date: Jan 2005
Posts: 3
OK, I just spent a while tracing through the code (gotta love print statements). as near as I can tell, this is due to an error in the transfer-encoding : chunked handling.

Quote:
2d
"><div id="leftimg"><a href="view_rec.php?id=
4
6315
c
"><img src="
This is, (from http://www.w3.org/Protocols/rfc2616/...html#sec19.4.6) supposed to be handled as:

0x2d (45) bytes of stuff in next chunk, followed by <crlf>
"><div id="leftimg"><a herf="view_rec.php?id=<crlf> is 0x2d, check, add it
0x4 (4) bytes in next chunk
6315<crlf> is 4 bytes, check! add it
0xC (12) bytes in next chunk
"><img src=" is 12 bytes, check! add it.

But what the code seems to be doing (in phpdigGetUrl) is:
2d ; chunk seperator, trim previous of <crlf>
"><div id="leftimg"><a href="view_rec.php?id= add it
4 chunk seperator, trim
6315 chunk seperator, trim
c chunk seperator, trim
"><img src=" add it


Gonna work on fixing up that code some over the weekend, I'll post up a patch for someone to double check, probably monday (unless I decide to sleep finally )

--attriel
attriel is offline   Reply With Quote
Old 01-14-2005, 08:06 PM   #3
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
The addition of a little counting might be faster than reading and processing the chunks. Try the attached code, for use with v.1.8.6, in place of the phpdigGetUrl function, and let me know how it works.
Attached Files
File Type: txt function_phpdigGetUrl.txt (4.6 KB, 16 views)
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
indexing a dynamic page r2b2_ry How-to Forum 0 12-13-2006 06:56 PM
Dynamic page indexing hame22 Troubleshooting 2 05-10-2005 11:07 AM
Indexing Dynamic Content greenman How-to Forum 0 11-11-2004 06:40 AM
problem indexing dynamic links. orbitalz Troubleshooting 3 04-30-2004 08:47 PM
converted from html pages to php pages now no pages will index!!! help!! bigals Troubleshooting 24 04-01-2004 10:34 AM


All times are GMT -8. The time now is 08:06 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.