ProductPromotion
Logo

Rust

made by https://0x3d.site

GitHub - greyblake/whatlang-rs: Natural language detection library for Rust. Try demo online: https://whatlang.org/
Natural language detection library for Rust. Try demo online: https://whatlang.org/ - greyblake/whatlang-rs
Visit Site

GitHub - greyblake/whatlang-rs: Natural language detection library for Rust. Try demo online: https://whatlang.org/

GitHub - greyblake/whatlang-rs: Natural language detection library for Rust. Try demo online: https://whatlang.org/

Stand With Ukraine

Content

Features

  • Supports 69 languages
  • 100% written in Rust
  • Lightweight, fast and simple
  • Recognizes not only a language, but also a script (Latin, Cyrillic, etc)
  • Provides reliability information

Get started

Example:

use whatlang::{detect, Lang, Script};

fn main() {
    let text = "Ĉu vi ne volas eklerni Esperanton? Bonvolu! Estas unu de la plej bonaj aferoj!";

    let info = detect(text).unwrap();
    assert_eq!(info.lang(), Lang::Epo);
    assert_eq!(info.script(), Script::Latin);
    assert_eq!(info.confidence(), 1.0);
    assert!(info.is_reliable());
}

For more details (e.g. how to blacklist some languages) please check the documentation.

Who uses Whatlang?

Whatlang is used within the following big projects as direct or indirect dependency for language recognition. You're gonna be in a great company using Whatlang:

  • Sonic - fast, lightweight and schema-less search backend in Rust.
  • Meilisearch - an open-source, easy-to-use, blazingly fast, and hyper-relevant search engine built in Rust.

Feature toggles

Feature Description
enum-map Lang and Script implement Enum trait from enum-map
arbitrary Support Arbitrary
serde Implements Serialize and Deserialize for Lang and Script
dev Enables whatlang::dev module which provides some internal API. It exists for profiling purposes and normal users are discouraged to to rely on this API.

How does it work?

How does the language recognition work?

The algorithm is based on the trigram language models, which is a particular case of n-grams. To understand the idea, please check the original whitepaper Cavnar and Trenkle '94: N-Gram-Based Text Categorization'.

How is is_reliable calculated?

It is based on the following factors:

  • How many unique trigrams are in the given text
  • How big is the difference between the first and the second(not returned) detected languages? This metric is called rate in the code base.

Therefore, it can be presented as 2d space with threshold functions, that splits it into "Reliable" and "Not reliable" areas. This function is a hyperbola and it looks like the following one:

For more details, please check a blog article Introduction to Rust Whatlang Library and Natural Language Identification Algorithms.

Make tasks

  • make bench - run performance benchmarks
  • make doc - generate and open doc
  • make test - run tests
  • make watch - watch changes and run tests

Comparison with alternatives

Whatlang CLD2 CLD3
Implementation language Rust C++ C++
Languages 68 83 107
Algorithm trigrams quadgrams neural network
Supported Encoding UTF-8 UTF-8 ?
HTML support no yes ?

Ports and clones

Donations

You can support the project by donating NEAR tokens.

Our NEAR wallet address is whatlang.near

Derivation

Whatlang is a derivative work from Franc (JavaScript, MIT) by Titus Wormer.

License

MIT © Sergey Potapov

Contributors

More Resources
to explore the angular.

mail [email protected] to add your project or resources here 🔥.

Related Articles
to learn about angular.

FAQ's
to learn more about Angular JS.

mail [email protected] to add more queries here 🔍.

More Sites
to check out once you're finished browsing here.

0x3d
https://www.0x3d.site/
0x3d is designed for aggregating information.
NodeJS
https://nodejs.0x3d.site/
NodeJS Online Directory
Cross Platform
https://cross-platform.0x3d.site/
Cross Platform Online Directory
Open Source
https://open-source.0x3d.site/
Open Source Online Directory
Analytics
https://analytics.0x3d.site/
Analytics Online Directory
JavaScript
https://javascript.0x3d.site/
JavaScript Online Directory
GoLang
https://golang.0x3d.site/
GoLang Online Directory
Python
https://python.0x3d.site/
Python Online Directory
Swift
https://swift.0x3d.site/
Swift Online Directory
Rust
https://rust.0x3d.site/
Rust Online Directory
Scala
https://scala.0x3d.site/
Scala Online Directory
Ruby
https://ruby.0x3d.site/
Ruby Online Directory
Clojure
https://clojure.0x3d.site/
Clojure Online Directory
Elixir
https://elixir.0x3d.site/
Elixir Online Directory
Elm
https://elm.0x3d.site/
Elm Online Directory
Lua
https://lua.0x3d.site/
Lua Online Directory
C Programming
https://c-programming.0x3d.site/
C Programming Online Directory
C++ Programming
https://cpp-programming.0x3d.site/
C++ Programming Online Directory
R Programming
https://r-programming.0x3d.site/
R Programming Online Directory
Perl
https://perl.0x3d.site/
Perl Online Directory
Java
https://java.0x3d.site/
Java Online Directory
Kotlin
https://kotlin.0x3d.site/
Kotlin Online Directory
PHP
https://php.0x3d.site/
PHP Online Directory
React JS
https://react.0x3d.site/
React JS Online Directory
Angular
https://angular.0x3d.site/
Angular JS Online Directory