module Babosa::UTF8::Proxy
A UTF-8 proxy for Babosa can be any object which responds to the methods in this module. The following proxies are provided by Babosa: {ActiveSupportProxy}, {DumbProxy}, {JavaProxy}, and {UnicodeProxy}.
Constants
- CP1252
Public Instance Methods
downcase(string)
click to toggle source
This is a stub for a method that should return a Unicode-aware downcased version of the given string.
# File lib/babosa/utf8/proxy.rb, line 49 def downcase(string) raise NotImplementedError end
normalize_utf8(string)
click to toggle source
This is a stub for a method that should return the Unicode NFC normalization of the given string.
# File lib/babosa/utf8/proxy.rb, line 61 def normalize_utf8(string) raise NotImplementedError end
tidy_bytes(string)
click to toggle source
Attempt to replace invalid UTF-8 bytes with valid ones. This method naively assumes if you have invalid UTF8 bytes, they are either Windows CP-1252 or ISO8859-1. In practice this isn't a bad assumption, but may not always work.
# File lib/babosa/utf8/proxy.rb, line 70 def tidy_bytes(string) string.scrub do |bad| tidy_byte(*bad.bytes).flatten.compact.pack('C*').unpack('U*').pack('U*') end end
upcase(string)
click to toggle source
This is a stub for a method that should return a Unicode-aware upcased version of the given string.
# File lib/babosa/utf8/proxy.rb, line 55 def upcase(string) raise NotImplementedError end
Private Instance Methods
tidy_byte(byte)
click to toggle source
# File lib/babosa/utf8/proxy.rb, line 120 def tidy_byte(byte) byte < 160 ? CP1252[byte] : byte < 192 ? [194, byte] : [195, byte - 64] end