|
11-17-2003, 08:02 AM | #1 |
Green Mole
Join Date: Nov 2003
Posts: 4
|
Problems with URL parsing
Hi,
I try to use phpDig. It seems very good to me. Although when I try to index my site the following happens. URLs like: http://www.webthings.nl/archive/2003...ewoon_een_hype(2)#body will be rewritten to: http://www.webthings.nl/archive/2003...ewoon_een_hype And that doesn't work. How can I fix this? Thanx to the programmer! I searched the web and phpDig was one of the best I could find. Greets, Arjan |
11-17-2003, 10:57 AM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. I'm not sure I understand the problem. When I index http://www.webthings.nl/archive/200...gewoon_een_hype(2)#body using one level I get the following results:
-------------------------------------------------------------------------------- SITE : http://www.webthings.nl/ Exclude paths : - @NONE@ 1:http://www.webthings.nl/archive/200...gewoon_een_hype(2) (time : 00:00:04) + level 1... Duplicate of an existing document 2:http://www.webthings.nl/archive/webthings_stylesheet.css (time : 00:00:06) No link in temporary table -------------------------------------------------------------------------------- links found : 2 http://www.webthings.nl/archive/200...gewoon_een_hype(2) http://www.webthings.nl/archive/webthings_stylesheet.css Optimizing tables... Indexing complete ! Then when I seach on realhosting I see the the following results: 1. [100.00 %] webthings/webdesign/webdesign nieuws limit to http://www.webthings.nl/, this path : archive/ ...2003 - Eduvision BV en Van Duuren Media - Hosting by Realhosting webthings/webdesign/webdesign nieuws webthings/webdesign/webdesign nieuws... When I click the link, it links me to http://www.webthings.nl/archive/200...gewoon_een_hype(2) and I see your page. When you do the above things, what do you see?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
11-18-2003, 04:05 AM | #3 |
Green Mole
Join Date: Nov 2003
Posts: 4
|
Hi Ruud,
I get the following: Warning: is_executable() [function.is-executable]: open_basedir restriction in effect. File(/usr/local/bin/pstotext) is not within the allowed path(s): (/vhost/webthings.nl/home) in /vhost/webthings.nl/home/www/html/zoek/admin/robot_functions.php on line 635 Duplicate of an existing document 6:http://www.webthings.nl/archive/2003...als_paypalmail (time : 00:00:03) (see last line) PhpDig has found the following: links found : 9 http://www.webthings.nl/ http://www.webthings.nl/archive/2003...ewoon_een_hype http://www.webthings.nl/pivot/submit...hings&group=k_ http://www.webthings.nl/pivot/submit...hings&group=k_ http://www.webthings.nl/webthings/ar...e_2003-m11.php http://www.webthings.nl/archive/2003...als_paypalmail http://www.webthings.nl/pivot/submit...hings&group=k_ http://www.webthings.nl/pivot/submit...hings&group=k_ http://www.webthings.nl/pivot/kortni...p?wtk=selected As you see it will not index the last (5) etc. Strange it works in your configuration I use the standard config (with Apache 1.3.27 and PHP 4.3.1). Any ideas? What am I doing wrong? Greets, Arjan |
11-18-2003, 10:59 AM | #4 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Try installing PhpDig in the open_basedir that is set. You can find this directory by looking at your PHP info ( <? phpinfo(); ?> ) or by asking your host. Also, try changing the path to pstotext. If you have access to shell and are able to use the locate command, you can locate the correct path to pstotext ( locate pstotext ) or try asking your host. Otherwise grab a copy of pstotext and place it in the open_basedir directory and use that path. If is_executable continues to give you problems, you can set USE_IS_EXECUTABLE_COMMAND to zero in the config file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
11-19-2003, 04:29 AM | #5 |
Green Mole
Join Date: Nov 2003
Posts: 4
|
Hi,
Tanx for the answer. The problem is however not that the executables will not work. I don't like to index pdf etc. But the problem is the system indexes URLs like http://www.webthings.nl/archive/2003...als_paypalmail(1) as 6:http://www.webthings.nl/archive/2003...als_paypalmail (time : 00:00:02) (1) fails. I saw in your file it will index it at your server, but it won't index it here? And I have no idea why that is. Do I need to change somethings in my config file? Greets, Arjan |
11-19-2003, 09:05 AM | #6 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Apache 1.3.27 and PHP 4.3.1 under what OS?
What do you see when you run the following: PHP Code:
Code:
Array ( [scheme] => http [host] => www.webthings.nl [path] => /archive/2003/11/14/nieuwe_worm_doet_zich_voor_als_paypalmail(1) )
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
11-20-2003, 03:35 AM | #7 |
Green Mole
Join Date: Nov 2003
Posts: 4
|
Hi. I am afraid I see the same... It seems that is not the problem. Any other ideas?
Greets, Arjan |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Mod-rewrite = spidering / URL problems | jcnorris | Troubleshooting | 1 | 10-26-2006 10:38 AM |
search.php suddenly stops parsing during "extract vars" | jackmoring | Troubleshooting | 1 | 08-25-2006 12:27 PM |
no msword to txt parsing | lolodev | External Binaries | 13 | 07-10-2004 03:11 PM |
Segmentation Fault / errors parsing | unknown | Troubleshooting | 8 | 04-11-2004 02:42 PM |
SPACE IN url | JPS | Troubleshooting | 10 | 02-06-2004 11:36 AM |