module Stringex::Unidecoder
Constants
- CODEPOINTS
Contains Unicode codepoints, loading as needed from YAML files
Public Class Methods
decode(string)
click to toggle source
Returns string with its UTF-8 characters transliterated to ASCII ones
You're probably better off just using the added String#to_ascii
# File lib/stringex/unidecoder.rb, line 16 def decode(string) string.chars.map{|char| decoded(char)}.join end
encode(codepoint)
click to toggle source
Returns character for the given Unicode codepoint
# File lib/stringex/unidecoder.rb, line 21 def encode(codepoint) ["0x#{codepoint}".to_i(16)].pack("U") end
get_codepoint(character)
click to toggle source
Returns Unicode codepoint for the given character
# File lib/stringex/unidecoder.rb, line 26 def get_codepoint(character) "%04x" % character.unpack("U")[0] end
in_yaml_file(character)
click to toggle source
Returns string indicating which file (and line) contains the transliteration value for the character
# File lib/stringex/unidecoder.rb, line 32 def in_yaml_file(character) unpacked = character.unpack("U")[0] "#{code_group(unpacked)}.yml (line #{grouped_point(unpacked) + 2})" end
Private Class Methods
code_group(unpacked_character)
click to toggle source
Returns the Unicode codepoint grouping for the given character
# File lib/stringex/unidecoder.rb, line 58 def code_group(unpacked_character) "x%02x" % (unpacked_character >> 8) end
decoded(character)
click to toggle source
# File lib/stringex/unidecoder.rb, line 39 def decoded(character) localized(character) || from_yaml(character) end
from_yaml(character)
click to toggle source
# File lib/stringex/unidecoder.rb, line 47 def from_yaml(character) return character unless character.ord > 128 unpacked = character.unpack("U")[0] CODEPOINTS[code_group(unpacked)][grouped_point(unpacked)] rescue # Hopefully this won't come up much # TODO: Make this note something to the user that is reportable to me perhaps "?" end
grouped_point(unpacked_character)
click to toggle source
Returns the index of the given character in the YAML file for its codepoint group
# File lib/stringex/unidecoder.rb, line 63 def grouped_point(unpacked_character) unpacked_character & 255 end
localized(character)
click to toggle source
# File lib/stringex/unidecoder.rb, line 43 def localized(character) Localization.translate(:transliterations, character) end