PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Bug Tracker

Reply
 
Thread Tools
Old 03-12-2004, 09:43 PM   #1
Konstantine
Green Mole
 
Join Date: Mar 2004
Location: Russia
Posts: 21
Bug in grabbing urls from the page

Hello again! I found, that if link looks like:

≶a href=next/index.html>next/index.html≶/a>

i.e. without quotes, phpdig don't follow it!
Konstantine is offline   Reply With Quote
Old 03-13-2004, 09:54 AM   #2
Konstantine
Green Mole
 
Join Date: Mar 2004
Location: Russia
Posts: 21
Sorry, no bug found
Konstantine is offline   Reply With Quote
Old 03-13-2004, 10:15 AM   #3
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi Konstantine, and welcome to PhpDig.net!

Thanks for the contributions too. It's good to have other people review the code and offer input.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-13-2004, 10:18 AM   #4
Konstantine
Green Mole
 
Join Date: Mar 2004
Location: Russia
Posts: 21
I had that problem on my work but can't verify it now and tell what really happend.
Konstantine is offline   Reply With Quote
Old 03-15-2004, 08:12 AM   #5
Konstantine
Green Mole
 
Join Date: Mar 2004
Location: Russia
Posts: 21
Hi again, so there IS a bug. I can't explain why is it, but you can see it. Try to index http://madboard.ru and look at the page http://madboard.ru/index.html?act=do&code=43. So you will not find links such as http://madboard.ru/index.html?act=do&code=45. Try it. I didn't found why is it happening. Any sugestions?
Konstantine is offline   Reply With Quote
Old 03-17-2004, 09:30 AM   #6
Konstantine
Green Mole
 
Join Date: Mar 2004
Location: Russia
Posts: 21
I found it (BUG)!!! And it's not in PHPDIG It's in PHP function

I use PHP Version 4.3.3, OS Linux, so the bug is in parse_url function

You can find it out on site madboard.ru. If you'll try to index it you'll find about 42 pages (if the bug is in your version of PHP).

So change in robot_functions.php in function function phpdigRewriteUrl($eval)

code:

PHP Code:
$url = @parse_url(str_replace('\\\\'"','',$eval));
if (!isset($url['path'])) {
     $url['path'] = '';

by following code:

PHP Code:
$url = @parse_url(str_replace('\\\\'"','',$eval));
$url['query']=str_replace("
&","&",$url['query']);
if (!isset($url['path'])) {
     $url['path'] = '';

After that try to index madboard.ru again

You'll find about 400 pages!!!

the bug is:

if you try to parse url http://madboard.ru/index.html?act=do&code=43 you'll get in 'query' line act=do&code=43

Any questions?

If you tried it and it was as I said, please reply on this message
Konstantine is offline   Reply With Quote
Old 03-25-2004, 06:34 AM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Speaking of & versus & there is a small bug in version 1.8.0 when PHPDIG_SESSID_REMOVE is set to true. To fix do the following.

In robot_functions.php find:
PHP Code:
$eval str_replace("&&","&",$eval);
$eval eregi_replace("[?][&]","?",$eval);
$eval eregi_replace("&$","",$eval); 
and replace with:
PHP Code:
$eval str_replace("&&","&",$eval);
$eval str_replace("?&","?",$eval);
$eval eregi_replace("&$","",$eval); 
Also, in robot_functions.php find:
PHP Code:
$file str_replace("&&","&",$file);
$file eregi_replace("[?][&]","?",$file);
$file eregi_replace("&$","",$file); 
and replace with:
PHP Code:
$file str_replace("&&","&",$file);
$file str_replace("?&","?",$file);
$file eregi_replace("&$","",$file); 
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
refine search to two or more urls fmehl How-to Forum 2 10-01-2006 01:52 PM
One of the urls is ALWAYS locked? ChadK Troubleshooting 1 09-23-2004 07:40 PM
help with grabbing text from calendar simonaut Coding & Tutorials 1 03-12-2004 11:15 AM
PhpDig crop the URLs at ( gaam Troubleshooting 2 02-11-2004 05:32 AM
Just getting all urls tobkau How-to Forum 1 01-29-2004 08:00 PM


All times are GMT -8. The time now is 11:36 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.