learn_russian | (no subject)

You're viewing

learn_russian
Create a Dreamwidth Account Learn More

Reload page in style: site light

I'm trying to create a vocabulary list from an e-text by means of histogram analysis, so I can concentrate on the most frequently used words.

But my program is unable to link the various forms of Russian verbs, nouns and adjectives.

Does anyone know a public domain algorithm (or a list of sets of related morphemes) that can do this with reasonable accuracy?

Pim

Flat | Top-Level Comments Only

From:

pzrk.livejournal.com

Try something like this:
Leave leading 3 letters as is, remove vowels and letters "вгйлмт" from the tail.

From:

oryx_and_crake

This certainly will not work, especially with irregular verbs like идти - шел - шла, which are the most used, too.

From:

ex-ex-zhuzh.livejournal.com

an online service that does what you want:
http://starling.rinet.ru/cgi-bin/morphque.cgi?flags=kndnnnn
you probably can write a script that uses it to fetch the canonical form of any word.

there's more information at http://starling.rinet.ru/program.php?lan=en and http://starling.rinet.ru/downl.php?lan=en#soft
they have useful databases and windows executables there, but i'm not sure about source code. it is probably available if you ask them.

From:

pigmeich.livejournal.com

ABBYY Lingvo carries it well.

From:

tir-sn.livejournal.com

AFAIK, Many homebrew search engines for Russian language used freeware 'ispell' software to get all forms of a world. ispell with Russian dictionaries is called 'rus-ispell'.
Good luck!