PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 11-17-2003, 08:02 AM   #1
apdejong
Green Mole
 
Join Date: Nov 2003
Posts: 4
Problems with URL parsing

Hi,

I try to use phpDig. It seems very good to me. Although when I try to index my site the following happens. URLs like:

http://www.webthings.nl/archive/2003...ewoon_een_hype(2)#body

will be rewritten to:

http://www.webthings.nl/archive/2003...ewoon_een_hype

And that doesn't work. How can I fix this?

Thanx to the programmer! I searched the web and phpDig was one of the best I could find.

Greets,
Arjan
apdejong is offline   Reply With Quote
Old 11-17-2003, 10:57 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. I'm not sure I understand the problem. When I index http://www.webthings.nl/archive/200...gewoon_een_hype(2)#body using one level I get the following results:

--------------------------------------------------------------------------------
SITE : http://www.webthings.nl/
Exclude paths :
- @NONE@
1:http://www.webthings.nl/archive/200...gewoon_een_hype(2)
(time : 00:00:04)
+
level 1...
Duplicate of an existing document
2:http://www.webthings.nl/archive/webthings_stylesheet.css
(time : 00:00:06)

No link in temporary table
--------------------------------------------------------------------------------

links found : 2
http://www.webthings.nl/archive/200...gewoon_een_hype(2)
http://www.webthings.nl/archive/webthings_stylesheet.css
Optimizing tables...
Indexing complete !

Then when I seach on realhosting I see the the following results:

1. [100.00 %] webthings/webdesign/webdesign nieuws
limit to http://www.webthings.nl/, this path : archive/

...2003 - Eduvision BV en Van Duuren Media - Hosting by Realhosting webthings/webdesign/webdesign nieuws webthings/webdesign/webdesign nieuws...

When I click the link, it links me to http://www.webthings.nl/archive/200...gewoon_een_hype(2) and I see your page.

When you do the above things, what do you see?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-18-2003, 04:05 AM   #3
apdejong
Green Mole
 
Join Date: Nov 2003
Posts: 4
Hi Ruud,

I get the following:

Warning: is_executable() [function.is-executable]: open_basedir restriction in effect. File(/usr/local/bin/pstotext) is not within the allowed path(s): (/vhost/webthings.nl/home) in /vhost/webthings.nl/home/www/html/zoek/admin/robot_functions.php on line 635
Duplicate of an existing document
6:http://www.webthings.nl/archive/2003...als_paypalmail
(time : 00:00:03)

(see last line)

PhpDig has found the following:

links found : 9
http://www.webthings.nl/
http://www.webthings.nl/archive/2003...ewoon_een_hype
http://www.webthings.nl/pivot/submit...hings&group=k_
http://www.webthings.nl/pivot/submit...hings&group=k_
http://www.webthings.nl/webthings/ar...e_2003-m11.php
http://www.webthings.nl/archive/2003...als_paypalmail
http://www.webthings.nl/pivot/submit...hings&group=k_
http://www.webthings.nl/pivot/submit...hings&group=k_
http://www.webthings.nl/pivot/kortni...p?wtk=selected


As you see it will not index the last (5) etc. Strange it works in your configuration
I use the standard config (with Apache 1.3.27 and PHP 4.3.1).

Any ideas? What am I doing wrong?

Greets,
Arjan
apdejong is offline   Reply With Quote
Old 11-18-2003, 10:59 AM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Try installing PhpDig in the open_basedir that is set. You can find this directory by looking at your PHP info ( <? phpinfo(); ?> ) or by asking your host. Also, try changing the path to pstotext. If you have access to shell and are able to use the locate command, you can locate the correct path to pstotext ( locate pstotext ) or try asking your host. Otherwise grab a copy of pstotext and place it in the open_basedir directory and use that path. If is_executable continues to give you problems, you can set USE_IS_EXECUTABLE_COMMAND to zero in the config file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-19-2003, 04:29 AM   #5
apdejong
Green Mole
 
Join Date: Nov 2003
Posts: 4
Hi,

Tanx for the answer. The problem is however not that the executables will not work. I don't like to index pdf etc. But the problem is the system indexes URLs like

http://www.webthings.nl/archive/2003...als_paypalmail(1)

as

6:http://www.webthings.nl/archive/2003...als_paypalmail
(time : 00:00:02)

(1) fails. I saw in your file it will index it at your server, but it won't index it here? And I have no idea why that is. Do I need to change somethings in my config file?

Greets,
Arjan
apdejong is offline   Reply With Quote
Old 11-19-2003, 09:05 AM   #6
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Apache 1.3.27 and PHP 4.3.1 under what OS?

What do you see when you run the following:
PHP Code:
<?
// remember to remove any "word" wrapping
$url="http://www.webthings.nl/archive/2003/11/14/nieuwe_worm_doet_zich_voor_als_paypalmail(1)";
print_r(parse_url($url));
?>
When viewing the HTML source, I get the following:
Code:
Array
(
    [scheme] => http
    [host] => www.webthings.nl
    [path] => /archive/2003/11/14/nieuwe_worm_doet_zich_voor_als_paypalmail(1)
)
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-20-2003, 03:35 AM   #7
apdejong
Green Mole
 
Join Date: Nov 2003
Posts: 4
Hi. I am afraid I see the same... It seems that is not the problem. Any other ideas?

Greets,
Arjan
apdejong is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Mod-rewrite = spidering / URL problems jcnorris Troubleshooting 1 10-26-2006 10:38 AM
search.php suddenly stops parsing during "extract vars" jackmoring Troubleshooting 1 08-25-2006 12:27 PM
no msword to txt parsing lolodev External Binaries 13 07-10-2004 03:11 PM
Segmentation Fault / errors parsing unknown Troubleshooting 8 04-11-2004 02:42 PM
SPACE IN url JPS Troubleshooting 10 02-06-2004 11:36 AM


All times are GMT -8. The time now is 05:21 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.