|
09-26-2003, 08:58 AM | #1 |
Green Mole
Join Date: Sep 2003
Posts: 10
|
Crawling Options
I just finished configuring PHPDIG on W2K/Appache 1.3x
IT works great, awesome program. But I have a question the crawling functionality: From the admin Index page you can enter domais to crawl, but is there a way to have phpdig crawl r****mly? Or go beyond the specified domain? Thank you, jimigisme |
09-26-2003, 05:38 PM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. PhpDig crawls the links in a page, depending on the number of levels chosen. I'm not sure what you mean by "go beyond the specified domain" though. Do you mean crawl subdirectories?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
09-26-2003, 06:54 PM | #3 |
Orange Mole
Join Date: Sep 2003
Location: Corbin KY
Posts: 45
|
Yes I was thinking the samething, if it could do r****m crawls. I'm building up my db and I would like it to crawl all over the place. The only site I don't want is Porn sites.
btw can you tell me more about Level I don't completely understand it... David J Harmon Cappuccino David |
09-26-2003, 07:06 PM | #4 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Currently PhpDig crawls links from a page.
Levels mean the number of links to follow from a page, looking for more links. For example, level one means to only follow the links on one page, but not links from links on that same page. Confused??? Code:
Level One Example: a.com - a1.com -- a11.com -- a12.com - a2.com -- a21.com -- a22.com
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
09-26-2003, 07:34 PM | #5 |
Orange Mole
Join Date: Sep 2003
Location: Corbin KY
Posts: 45
|
Got it... thanks
I'm still new to php, but I see it is a powerful lang. I just use HTML but moving to php. Anyway to give the spider a list of url for he (yes a he) can go out when I'm not at my computer? I'm trying to build up my db. David J Harmon Cappuccino David |
09-26-2003, 07:43 PM | #6 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. You could set up a cron job or make a text file with URLs and use shell access to crawl.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
09-26-2003, 08:58 PM | #7 |
Orange Mole
Join Date: Sep 2003
Location: Corbin KY
Posts: 45
|
didn't think about that, could you give me an example.
|
09-27-2003, 05:59 PM | #8 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
On *nix say you want to run a cron job that spiders on the 1st and 15th of every month.
First make a list of URLs, one per line, in a file called cronlist.txt Then create a file called cronjob.txt that contains the following on one line, editing the paths to php and to spider.php: Code:
0 0 1,15 * * /path/to/php -f /path/to/admin/spider.php cronlist.txt >> spider.log Code:
crontab cronjob.txt -u A cron tutorial can be found at http://www.linuxhelp.net/guides/cron/.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
09-27-2003, 10:51 PM | #9 |
Orange Mole
Join Date: Sep 2003
Location: Corbin KY
Posts: 45
|
Thank you that should help me out some, I've still have a long ways on my db but its coming along. It should be open to the public end of this week, their I hope to build it up.
If theire is anything I could help out let me know??? |
09-28-2003, 10:19 PM | #10 |
Former Member
Join Date: Sep 2003
Posts: 34
|
Hi,
I've done all the cronjob.txt , cronlist.txt thing , but I guess this is a stupid question to ask , but , How do you run a Shell comand on a Linux server? Please give me an example/tell me how to do that. Need help here, argent. Cheers |
09-28-2003, 10:45 PM | #11 |
Orange Mole
Join Date: Sep 2003
Location: Corbin KY
Posts: 45
|
Do you have CPanel?? if so go to Cron Job it has most of it done for you. just put - /path/to/php -f /path/to/admin/spider.php cronlist.txt >> spider.log - into the open space and it will set you up for the job. But remember to change the path to the right one...
David |
09-29-2003, 04:26 PM | #12 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
If you don't have CPanel or the like, then you'd need shell access via Telnet/SSH to use crontab, assuming crontab is available on your machine. If your host doesn't allow Telnet/SSH access, there is a CGI-Telnet script that you could try in order to run non-interactive shell commands from your browser.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
09-30-2003, 11:09 AM | #13 |
Green Mole
Join Date: Sep 2003
Posts: 10
|
I found the answer to my question on another thread, thank you to all that replied.
Here is what I found, but I haven't tested it yet. "There is a function in robots_functions.php located in your admin directory. This function compares the current with the new URL.... it returns either true or false, set the false to true and it will follow any domain link or URL it finds.... though be careful, your database will exceed your hosting space soon!!!! I have currently over 10000 sites indexed and the database is a bit more than 1 GB! The functions name is phpdigCompareDomains($url1,$url2) search for this and as I sad replace false with true. Also there is a Flag in your config.php that has also to be true (StayInDomain or something like this.. u gotta find it)" .. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
*please help*: crawling search sites -howto make a metasearcher | nicozab | How-to Forum | 1 | 07-04-2006 05:04 PM |
crawling of only internal links? | manute | Troubleshooting | 1 | 06-19-2004 06:38 AM |
admin options | mikeduff | Mod Requests | 3 | 06-09-2004 04:39 PM |
what do these options mean? | orbitalz | How-to Forum | 1 | 04-29-2004 04:54 AM |