org.apache.commons.codec.language.bm
Class PhoneticEngine

java.lang.Object
  extended by org.apache.commons.codec.language.bm.PhoneticEngine

public class PhoneticEngine
extends java.lang.Object

Converts words into potential phonetic representations.

This is a two-stage process. Firstly, the word is converted into a phonetic representation that takes into account the likely source language. Next, this phonetic representation is converted into a pan-european 'average' representation, allowing comparison between different versions of essentially the same word from different languages.

This class is intentionally immutable. If you wish to alter the settings for a PhoneticEngine, you must make a new one with the updated settings. This makes the class thread-safe.

Ported from phoneticengine.php

Since:
1.6
Version:
$Id: PhoneticEngine.java 1378746 2012-08-29 21:29:49Z tn $

Nested Class Summary
(package private) static class PhoneticEngine.PhonemeBuilder
          Utility for manipulating a set of phonemes as they are being built up.
private static class PhoneticEngine.RulesApplication
          A function closure capturing the application of a list of rules to an input sequence at a particular offset.
 
Field Summary
private  boolean concat
           
private static int DEFAULT_MAX_PHONEMES
           
private  Lang lang
           
private  int maxPhonemes
           
private static java.util.Map<NameType,java.util.Set<java.lang.String>> NAME_PREFIXES
           
private  NameType nameType
           
private  RuleType ruleType
           
 
Constructor Summary
PhoneticEngine(NameType nameType, RuleType ruleType, boolean concat)
          Generates a new, fully-configured phonetic engine.
PhoneticEngine(NameType nameType, RuleType ruleType, boolean concat, int maxPhonemes)
          Generates a new, fully-configured phonetic engine.
 
Method Summary
private  PhoneticEngine.PhonemeBuilder applyFinalRules(PhoneticEngine.PhonemeBuilder phonemeBuilder, java.util.List<Rule> finalRules)
          Applies the final rules to convert from a language-specific phonetic representation to a language-independent representation.
private static java.lang.CharSequence cacheSubSequence(java.lang.CharSequence cached)
          This is a performance hack to avoid overhead associated with very frequent CharSequence.subSequence calls.
 java.lang.String encode(java.lang.String input)
          Encodes a string to its phonetic representation.
 java.lang.String encode(java.lang.String input, Languages.LanguageSet languageSet)
          Encodes an input string into an output phonetic representation, given a set of possible origin languages.
 Lang getLang()
          Gets the Lang language guessing rules being used.
 int getMaxPhonemes()
          Gets the maximum number of phonemes the engine will calculate for a given input.
 NameType getNameType()
          Gets the NameType being used.
 RuleType getRuleType()
          Gets the RuleType being used.
 boolean isConcat()
          Gets if multiple phonetic encodings are concatenated or if just the first one is kept.
private static java.lang.String join(java.lang.Iterable<java.lang.String> strings, java.lang.String sep)
          Joins some strings with an internal separator.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NAME_PREFIXES

private static final java.util.Map<NameType,java.util.Set<java.lang.String>> NAME_PREFIXES

DEFAULT_MAX_PHONEMES

private static final int DEFAULT_MAX_PHONEMES
See Also:
Constant Field Values

lang

private final Lang lang

nameType

private final NameType nameType

ruleType

private final RuleType ruleType

concat

private final boolean concat

maxPhonemes

private final int maxPhonemes
Constructor Detail

PhoneticEngine

public PhoneticEngine(NameType nameType,
                      RuleType ruleType,
                      boolean concat)
Generates a new, fully-configured phonetic engine.

Parameters:
nameType - the type of names it will use
ruleType - the type of rules it will apply
concat - if it will concatenate multiple encodings

PhoneticEngine

public PhoneticEngine(NameType nameType,
                      RuleType ruleType,
                      boolean concat,
                      int maxPhonemes)
Generates a new, fully-configured phonetic engine.

Parameters:
nameType - the type of names it will use
ruleType - the type of rules it will apply
concat - if it will concatenate multiple encodings
maxPhonemes - the maximum number of phonemes that will be handled
Since:
1.7
Method Detail

cacheSubSequence

private static java.lang.CharSequence cacheSubSequence(java.lang.CharSequence cached)
This is a performance hack to avoid overhead associated with very frequent CharSequence.subSequence calls.

Parameters:
cached - the character sequence to cache
Returns:
a CharSequence that internally caches subSequence values

join

private static java.lang.String join(java.lang.Iterable<java.lang.String> strings,
                                     java.lang.String sep)
Joins some strings with an internal separator.

Parameters:
strings - Strings to join
sep - String to separate them with
Returns:
a single String consisting of each element of strings interleaved by sep

applyFinalRules

private PhoneticEngine.PhonemeBuilder applyFinalRules(PhoneticEngine.PhonemeBuilder phonemeBuilder,
                                                      java.util.List<Rule> finalRules)
Applies the final rules to convert from a language-specific phonetic representation to a language-independent representation.

Parameters:
phonemeBuilder - the current phonemes
finalRules - the final rules to apply
Returns:
the resulting phonemes

encode

public java.lang.String encode(java.lang.String input)
Encodes a string to its phonetic representation.

Parameters:
input - the String to encode
Returns:
the encoding of the input

encode

public java.lang.String encode(java.lang.String input,
                               Languages.LanguageSet languageSet)
Encodes an input string into an output phonetic representation, given a set of possible origin languages.

Parameters:
input - String to phoneticise; a String with dashes or spaces separating each word
languageSet -
Returns:
a phonetic representation of the input; a String containing '-'-separated phonetic representations of the input

getLang

public Lang getLang()
Gets the Lang language guessing rules being used.

Returns:
the Lang in use

getNameType

public NameType getNameType()
Gets the NameType being used.

Returns:
the NameType in use

getRuleType

public RuleType getRuleType()
Gets the RuleType being used.

Returns:
the RuleType in use

isConcat

public boolean isConcat()
Gets if multiple phonetic encodings are concatenated or if just the first one is kept.

Returns:
true if multiple phonetic encodings are returned, false if just the first is

getMaxPhonemes

public int getMaxPhonemes()
Gets the maximum number of phonemes the engine will calculate for a given input.

Returns:
the maximum number of phonemes
Since:
1.7


commons-codec version 1.7-SNAPSHOT - Copyright © 2002-2013 - Apache Software Foundation