|
04-29-2004, 06:59 AM | #1 |
Green Mole
Join Date: Apr 2004
Posts: 12
|
URL Bot
Thus far i've seen the spider functions only deal with spidering a particular site and returning only results within the spidered URL. An option that would allow the Admin to ignore the base URL and return only links to external URL's would allow for spidering of a link farm site or links page and harvesting the links back into PhP dig. For example:
I built a cgi engine and have tons of links indexed on it, if i use PhP dig to try to spider the links from the original engine, it returns MY url links instead of ignoring base url and spidering the external links at a depth of 1. Thus it is a URL harvester spider rather than just a site spider. My cgi engine does this with the greatest of ease, i can spider a particular directory of DMOZ and bring back only the links and their relative URLS. If someone out there (in the Mole Squad) is proficient at both PhP and CGI i'd be willing to make my engine available and perhaps we can cross the spider functions into PhP dig and save some raw coding time for all. It also features admin features for visitor added URL's that can be directly edited rather than just spidered. At this time i see no way of editing spidered or user submitted urls without doing such at an SQL level which might also be a useful PhPdig function to consider. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to unlock an URL | philbihr | How-to Forum | 2 | 11-08-2004 05:11 AM |
Rogue Bot Rant | Charter | IPs, SEs, & UAs | 1 | 09-13-2004 08:47 PM |
getting Url in Results | firestarter | Mod Requests | 0 | 08-31-2004 03:53 PM |
The Dot Bot | Charter | IPs, SEs, & UAs | 0 | 04-14-2004 04:56 PM |
Bot from IP 61.247.241.173 | Charter | IPs, SEs, & UAs | 0 | 03-10-2004 10:56 AM |