|
11-29-2004, 06:58 PM | #1 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Spider From A File Thru Web Interface
You know how phpdig spiders by reading a file of URLs when doing so from shell. I'd like to see the same thing when spidering from the web interface. Thanks in advance for giving this idea some consideration.
|
12-01-2004, 06:44 PM | #2 |
Green Mole
Join Date: Dec 2004
Posts: 10
|
Here is the way I did it.
It is very simple mind you but it works. I have a submit site page with a form for others [or myself] to submit pages to be reviewed for indexing. It submits these links to a mysql table. I have a script with an SQL statement using left join to get only the items not added. On this script it shows up the links (you can have the links display in an input box so the reviewer can edit the link [to add a trailing slash or http://www or what ever] ) with two check boxes, one for add and one for deny. Deny deletes the row and add inserts it to the site table with upddate=0. I have another script that when executed (and this could be done from a link on the first script) that then runs spider.php by exec(). It finds all the sites where upddate=0 and loops through them to run exec(). I put a limit of 10 on it just so I can have a little more control over it. [once you run spider.php for a site it updates the upddate to current timestamp on lock and unlock of the tables] What I did may not be the best solution, but it got working quickly. I will make a better solution sometime in the next couple weeks. |
12-01-2004, 06:54 PM | #3 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Would you be willing to post your code? I sure would appreciate it, and I'm sure others would as well. Thanks.
|
12-15-2004, 04:15 AM | #4 |
Green Mole
Join Date: Feb 2004
Posts: 17
|
I use a perl-script (tree.pl) to feed the spider. It is very easy to use, and you get the urls (htm(l), doc, pdf, etc wich you want.
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
no spider my file links | lolodev | Troubleshooting | 21 | 07-16-2004 07:31 PM |
is it real to inrease indexing time with web interface? | zaartix | How-to Forum | 1 | 07-14-2004 09:13 PM |
spider only one site/file | jdc32 | Troubleshooting | 2 | 07-02-2004 06:49 AM |
phpdig spider hangs (a powerpoint file problem) | davideyre | Troubleshooting | 1 | 03-29-2004 01:35 PM |
Indexing by command line interface | Skop | Troubleshooting | 8 | 10-14-2003 03:23 AM |