|
01-14-2005, 12:23 PM | #1 |
Green Mole
Join Date: Jan 2005
Posts: 3
|
indexing dynamic pages
So, I tossed phpdig onto my dev server, figure I'll see how it goes before worrying about how to hack it onto the deployment servers and their frankensteinien convolutions.
But I immediately run into a problem. I tell it to start indexing at: http://site.name.here/ And it starts, and it finds the links on that page, (there are 28, I believe) all of the form: http://site.name.here/view.php?id=13672 Unfortunately, it appears that I'm tossing the # out, and just going to http://site.name.here/view.php?id= Since there are roughly 15000 various IDs involved in different sections, indexing 3 pages is suboptimal :/ (index, and 2 variations on the url) I thought it might be the PHPSESSID, but I flipped that off in the config and it continues stripping, so ... What variable do I want to tune to make it retain those #'s. B/c they're mildly important All links are relative, but I don't imagine that should matter Thanks --attriel (I can't give the link , since it's still a development server and not publicly available anywhere) |
01-14-2005, 02:21 PM | #2 | |
Green Mole
Join Date: Jan 2005
Posts: 3
|
OK, I just spent a while tracing through the code (gotta love print statements). as near as I can tell, this is due to an error in the transfer-encoding : chunked handling.
Quote:
0x2d (45) bytes of stuff in next chunk, followed by <crlf> "><div id="leftimg"><a herf="view_rec.php?id=<crlf> is 0x2d, check, add it 0x4 (4) bytes in next chunk 6315<crlf> is 4 bytes, check! add it 0xC (12) bytes in next chunk "><img src=" is 12 bytes, check! add it. But what the code seems to be doing (in phpdigGetUrl) is: 2d ; chunk seperator, trim previous of <crlf> "><div id="leftimg"><a href="view_rec.php?id= add it 4 chunk seperator, trim 6315 chunk seperator, trim c chunk seperator, trim "><img src=" add it Gonna work on fixing up that code some over the weekend, I'll post up a patch for someone to double check, probably monday (unless I decide to sleep finally ) --attriel |
|
01-14-2005, 08:06 PM | #3 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
The addition of a little counting might be faster than reading and processing the chunks. Try the attached code, for use with v.1.8.6, in place of the phpdigGetUrl function, and let me know how it works.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
indexing a dynamic page | r2b2_ry | How-to Forum | 0 | 12-13-2006 06:56 PM |
Dynamic page indexing | hame22 | Troubleshooting | 2 | 05-10-2005 11:07 AM |
Indexing Dynamic Content | greenman | How-to Forum | 0 | 11-11-2004 06:40 AM |
problem indexing dynamic links. | orbitalz | Troubleshooting | 3 | 04-30-2004 08:47 PM |
converted from html pages to php pages now no pages will index!!! help!! | bigals | Troubleshooting | 24 | 04-01-2004 10:34 AM |