L-exp Mobile

WhatLanguage: Ruby Library To Detect The Language Of A Text

WhatLanguage is a library by Peter Cooper (disclaimer: yes, that's me) that makes it quick and easy to determine what language a supplied text is written in. It's pretty accurate on anything from a short sentence up to several paragraphs in all of the languages supplied with the library (Dutch, English, Farsi, Russian, French, German, Portuguese, Spanish, Pinyin) and adding languages of your own choosing isn't difficult.

The library works by checking for the presence of words with bloom filters built from dictionaries based upon each source language. We've covered bloom filters on Ruby Inside before, but essentially they're probabilistic data structures based upon hashing a large set of content. They're ideal in situations where you want to check set memberships but the threat of false positives is acceptable in return for significant memory savings (and a 250KB bloom filter is a lot nicer to deal with than a 14MB+ dictionary).

WhatLanguage is available from GitHub (and can be installed as a gem from there with gem install peterc-whatlanguage) or from RubyForge with a simpler gem install whatlanguage. Once installed, usage is simple:

require 'whatlanguage' "Je suis un homme".language # => :french # OR... wl = WhatLanguage.new(:all) wl.language("Je suis un homme") # => :french wl.process_text("this is a test of whatlanguage's great language detection features") # => {:german=>1, :dutch=>3, :portuguese=>3, :english=>7, :russian=>1, :farsi=>1, :spanish=>3, :french=>2}

I wrote the library initially a year ago but have only just made it available for public use, so if there are unforeseen bugs to fix or things that really need to be added, fork it on GitHub and get playing.

This post is sponsored by AlphaSights Ltd - AlphaSights are recruiting. If you're looking for a Ruby on Rails opportunity, can work in Cambridge, UK and enjoy the buzz of a brand new well-funded startup then look no further. AlphaSights are recruiting from entry level to senior positions and offer very competitive salaries and a great working environment.



Options:   Save This | Share
Viewed 0 times
Published 3 months ago
By Peter Cooper
From Resource RubyInside in lists:
Ruby resources for Rails developers

Menu

by Genís