|
11-19-2004, 01:46 PM | #1 |
Green Mole
Join Date: Nov 2004
Posts: 4
|
Spiders site first time, but update doesn't spider
Hello,
I have recently installed PHPDig and it will index my site the first time correctly and works great. But if I remove a page, then "update sites" in the admin or do a command "forceall" update, nothing is actually updated (takes 0 seconds) and the deleted page is still in the database. My LIMIT_DAYS is 0 in the in config.php. Here is the output of the command line spidering: ------------- force reindex of site7811: old priority 0, new priority 18 Spidering in progress... ----------------------------- SITE : http://www.mysite.org/ Exclude paths : - @NONE@ No link in temporary table links found : 0 Optimizing tables... Indexing complete ! Any ideas on this? If I can't get it to respider the site, then I'll have to find another search solution. I hope someone has an answer though since I already spent time on PHPDig and the search works great after the first spidering of the site. Thanks, John |
11-19-2004, 05:37 PM | #2 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Don't know how you tried updating, but try this.
Go into the Admin panel. On the right side of the screen, click to highlight the site you want to update, then click on Update Form. That will take you to a second screen. Click on the green checkmark for whichever branch you want to update. Clicking on the one next to Root will update everything. |
11-22-2004, 04:32 PM | #3 |
Green Mole
Join Date: Nov 2004
Posts: 4
|
That did it! Kudos to you and the PHPDig people for making a viable free PHP only search solution!
Thanks so much, it updates correctly through the browser now, after I got commented out those "set_time_limit"'s that kept it from updating in php safe mode. I re-read the docs and still can't get the command line version to work (but now see what I missed conerning using the web admin). Or I guess I can cron wget to update the site's index via a web call to spider.php?site_id=1&mode=small . Unless you know what is the correct commandline way of doing a site re-indexing? Here is what I am calling: /usr/bin/php -f /path.to.it/phpdig/admin/spider.php forceall http://www.oursite.org But it never indexes anything: SITE : http://www.oursite.org/ Exclude paths : - @NONE@ No link in temporary table links found : 0 Optimizing tables... Thanks |
11-22-2004, 06:41 PM | #4 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
You might want to have a look at the phpdig documentation for command line indexing. If you've followed that and indexing still doesn't work, I don't know what to tell you. I've tried indexing my site via command line, and it's touchy at best.
|
11-23-2004, 09:17 AM | #5 |
Green Mole
Join Date: Nov 2004
Posts: 4
|
Thanks again Vinyl-Junkie. I gave up on the command line version, I've had problems with command line php before though, it doesn't appear to be the same exact php that runs in apache on our server.
For those interested, here is the command line I came up with that works, it uses wget to call the php via apache: /usr/bin/wget --timeout 3600 --http-user your_htaccess_username --http-passwd your_htaccess_password "http://www.mysite.org/phpdig/admin/update.php?path=&site_id=1&exp=1" 1>/dev/null 2>/dev/null (where it is all one line with no breaks) Here's the explanation of the command: /usr/bin/wget Replace this with what ever path there is to your wget, find out using "which wget" at the SSH/telnet prompt. --timeout 3600 this is the timeout in seconds. I removed the sleep(5) in the spider.php, if you keep it there increase this timeout a lot, my update.php call takes 20 seconds so an hour should be plenty for me. I also removed all the "set_timeout_limit" calls from PHPDig since my PHP safe mode gave errors that resulted in "update form" not reindexing the site. --http-user & --http_passwd I turned off the PHPDig admin user/pass by updating this in config.php: define('PHPDIG_ADM_AUTH','0'); Then I password protected the admin directory using apache .htaccess files, and included their username and password as these parameters. "http://www.mysite.org/phpdig/admin/update.php?path=&site_id=1&exp=1" This link should work in your browser (after logging in) and force a site reindex. I got this link by selecting my site in the admin and clicking "update form", then copying the link that is there on the "root" green checkbox. I had to include the quotes so wget understood the full link. 1>/dev/null 2>/dev/null This is optional. Without it, each time this command runs wget will save another copy of the page loaded which you may want as a record of the indexing. By including this, no files are created by wget. Then you can add this to your cronjob so it can reindex automatically. Hopefully this will help people and be my small contribution back to PHPDig. |
11-24-2004, 11:42 AM | #6 |
Green Mole
Join Date: Dec 2003
Posts: 11
|
Thanks for this insight, Ensim!
I never even thought about using the wget command. Works great! It should be noted that some cron setups will drop the "http://www.mysite.org/phpdig/admin/update.php?path=&site_id=1&exp=1" due to the double quotes. Using single quotes like 'http://www.mysite.org/phpdig/admin/update.php?path=&site_id=1&exp=1' seems to work fine in all cases. Thanks again, and awesome work! |
11-30-2004, 02:17 PM | #7 |
Green Mole
Join Date: Nov 2004
Posts: 4
|
Glad I could help Siliconkibou!
Looks like Indeh found a more elegant solution, kudos on that. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
1.8.3 spiders slow, 1.6.3 spiders same site fast | Wayne McBryde | Troubleshooting | 0 | 09-21-2004 08:10 PM |
Spider doesn't work for first time | sbrinkmann | Script Installation | 1 | 09-07-2004 04:34 PM |
QUESTION: How-to Spider Multiple URL's, not just one at a time. | 2wheelin | How-to Forum | 4 | 06-13-2004 11:42 PM |
Spider & time limit | onlytrue | How-to Forum | 1 | 04-16-2004 06:03 AM |
Command Line Spider spiders all sites | Wayne McBryde | Troubleshooting | 3 | 01-27-2004 06:15 PM |