|
02-22-2005, 06:19 AM | #1 |
Orange Mole
Join Date: Jan 2004
Location: In outer space
Posts: 37
|
Auto language guesser
First, don't expect too much of that post, I've only done half part of the job (the easiest half ).
Now that PhpDig can spider multi encoding it can also spider multi language sites and you will probably want to differenciate the language of each page. There are several tools to guess languages and happily a few of them are free! I came across Languid: a statistical language identifier by Maciej Ceglowski (http://languid.cantbedone.org/). It's a great tool that can guess 72 languages with big accuracy. It is originally written in Perl (source can be found at http://search.cpan.org/~mceglows/). I wrote a small function to guess what is the language of a text basing upon the XML API of Languid. Now I need help to insert it to PhpDig. OK, it will slow a little bit the spidering process but this would be worthy. We would need to add this function to robot_functions.php and create a new field in spider table to store the language ID. Or maybe can someone write a port to PHP of the original script by Maciej in Perl. Anyone ready to give me hand on this?
__________________
Uchû Senshi Edomondo http://www.leijiverse.com http://shonen-kokoro.fr.st http://tsukanomanoharu.fr.st |
02-28-2005, 04:09 AM | #2 |
Orange Mole
Join Date: Jan 2004
Location: In outer space
Posts: 37
|
Second step: set languages to your indexed pages.
Download both files attached here. Upload them to your PhpDig admin directory. Then add language to MySQL in spider table (add prefix if necessary): Code:
ALTER TABLE `spider` ADD `language` CHAR(2) NOT NULL; PHP Code:
This will take a while and unfortunately results are not always accurate. So you may want to set languages manually instead. First open set_language.php in a text editor and set in the $lang_to_set array only the languages you will index. Example: PHP Code:
You will have the possibility to set a language to a whole site on just on subdirectories. Each link is listed with its language value, so you can check if everything is OK. Please keep in mind that I am far from being a powerful scripter. Many people on this forum could have done a 1000 times easier and neater code. Don’t hesitate to post bug reports, improvements... Next step: build the pull down menu to select the languages and change search_functions.php to support this feature.
__________________
Uchû Senshi Edomondo http://www.leijiverse.com http://shonen-kokoro.fr.st http://tsukanomanoharu.fr.st |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
language as chinese | bailywen | How-to Forum | 1 | 07-18-2005 04:19 PM |
search only by one language | OceanSurf | How-to Forum | 0 | 10-12-2004 06:21 AM |
china language | hua | How-to Forum | 6 | 09-18-2004 11:18 AM |
auto re-indexing on shared hosting server | mental cube | How-to Forum | 1 | 09-07-2004 04:10 PM |
auto indexing without shell command | takpoli | How-to Forum | 1 | 04-29-2004 07:26 AM |