org.biojava.bio.symbol
Class AlphabetManager

java.lang.Object
  extended by org.biojava.bio.symbol.AlphabetManager

public final class AlphabetManager
extends Object

Utility methods for working with Alphabets. Also acts as a registry for well-known alphabets.

The alphabet interfaces themselves don't give you a lot of help in actually getting an alphabet instance. This is where the AlphabetManager comes in handy. It helps out in serialization, generating derived alphabets and building CrossProductAlphabet instances. It also contains limited support for parsing complex alphabet names back into the alphabets.

Author:
Matthew Pocock, Thomas Down, Mark Schreiber

Constructor Summary
AlphabetManager()
           
 
Method Summary
static Alphabet alphabetForName(String name)
          Retrieve the alphabet for a specific name.
static Iterator alphabets()
          Get an iterator over all alphabets known.
static Symbol createSymbol(Annotation annotation, List symList, Alphabet alpha)
           Generates a new Symbol instance that represents the tuple of Symbols in symList.
static Symbol createSymbol(Annotation annotation, Set symSet, Alphabet alpha)
           Generates a new Symbol instance that represents the tuple of Symbols in symList.
static Symbol createSymbol(char token, Annotation annotation, List symList, Alphabet alpha)
          Deprecated. use the new version, without the token argument
static Symbol createSymbol(char token, Annotation annotation, Set symSet, Alphabet alpha)
          Deprecated. use the three-arg version of this method instead.
static AtomicSymbol createSymbol(char token, String name, Annotation annotation)
          Deprecated. Use the two-arg version of this method instead.
static AtomicSymbol createSymbol(String name)
           Generate a new AtomicSymbol instance with a name and an Empty Annotation.
static AtomicSymbol createSymbol(String name, Annotation annotation)
           Generate a new AtomicSymbol instance with a name and Annotation.
static List factorize(Alphabet alpha, Set symSet)
           Return a list of BasisSymbol instances that uniquely sum up all AtomicSymbol instances in symSet.
static Alphabet generateCrossProductAlphaFromName(String name)
          Generates a new CrossProductAlphabet from the give name.
static Symbol getAllAmbiguitySymbol(FiniteAlphabet alpha)
          Return the ambiguity symbol which matches all symbols in a given alphabet.
static Set getAllSymbols(FiniteAlphabet alpha)
          Return a set containing all possible symbols which can be considered members of a given alphabet, including ambiguous symbols.
static AlphabetIndex getAlphabetIndex(FiniteAlphabet alpha)
          Get an indexer for a specified alphabet.
static AlphabetIndex getAlphabetIndex(Symbol[] syms)
          Get an indexer for an array of symbols.
static Alphabet getCrossProductAlphabet(List aList)
           Retrieve a CrossProductAlphabet instance over the alphabets in aList.
static Alphabet getCrossProductAlphabet(List aList, Alphabet parent)
           Retrieve a CrossProductAlphabet instance over the alphabets in aList.
static Alphabet getCrossProductAlphabet(List aList, String name)
          Attempts to create a cross product alphabet and register it under a name.
static Symbol getGapSymbol()
           Get the special `gap' Symbol.
static Symbol getGapSymbol(List alphas)
           Get the gap symbol appropriate to this list of alphabets.
static AlphabetManager instance()
          Deprecated. all AlphabetManager methods have become static
static void loadAlphabets(InputSource is)
          Load additional Alphabets, defined in XML format, into the AlphabetManager's registry.
static void registerAlphabet(String[] names, Alphabet alphabet)
          Register and Alphabet by more than one name.
static void registerAlphabet(String name, Alphabet alphabet)
          Register an alphabet by name.
static boolean registered(String name)
          Has an Alphabet been registered by that name
static Set registrations()
          A set of names under which Alphabets have been registered.
static Symbol symbolForLifeScienceID(LifeScienceIdentifier lsid)
          Retreives the Symbol for the LSID
static Symbol symbolForName(String name)
          Deprecated. use symbolForLifeScienceID() instead
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AlphabetManager

public AlphabetManager()
Method Detail

instance

public static AlphabetManager instance()
Deprecated. all AlphabetManager methods have become static

Retrieve the singleton instance.

Returns:
the AlphabetManager instance

getAllAmbiguitySymbol

public static Symbol getAllAmbiguitySymbol(FiniteAlphabet alpha)
Return the ambiguity symbol which matches all symbols in a given alphabet.

Parameters:
alpha - The alphabet
Returns:
the ambiguity symbol
Since:
1.2

getAllSymbols

public static Set getAllSymbols(FiniteAlphabet alpha)
Return a set containing all possible symbols which can be considered members of a given alphabet, including ambiguous symbols. Warning, this method can return large sets!

Parameters:
alpha - The alphabet
Returns:
The set of symbols that are members of alpha
Since:
1.2

alphabetForName

public static Alphabet alphabetForName(String name)
                                throws NoSuchElementException
Retrieve the alphabet for a specific name.

Parameters:
name - the name of the alphabet
Returns:
the alphabet object
Throws:
NoSuchElementException - if there is no alphabet by that name

symbolForName

public static Symbol symbolForName(String name)
                            throws NoSuchElementException
Deprecated. use symbolForLifeScienceID() instead

Retrieve the symbol represented a String object

Parameters:
name - of the string whose symbol you want to get
Returns:
The Symbol
Throws:
NoSuchElementException - if the string name is invalid.

symbolForLifeScienceID

public static Symbol symbolForLifeScienceID(LifeScienceIdentifier lsid)
Retreives the Symbol for the LSID

Parameters:
lsid - the URN for the Symbol
Returns:
a reference to the Symbol

registerAlphabet

public static void registerAlphabet(String name,
                                    Alphabet alphabet)
Register an alphabet by name.

Parameters:
name - the name by which it can be retrieved
alphabet - the Alphabet to store

registerAlphabet

public static void registerAlphabet(String[] names,
                                    Alphabet alphabet)
Register and Alphabet by more than one name. This allows aliasing of an alphabet with two or more names. It is equivalent to calling registerAlphabet(String name, Alphabet alphabet) several times.

Parameters:
names - the names by which it can be retrieved
alphabet - the Alphabet to store
Since:
1.4

registrations

public static Set registrations()
A set of names under which Alphabets have been registered.

Returns:
a Set of Strings

registered

public static boolean registered(String name)
Has an Alphabet been registered by that name

Parameters:
name - the name of the alphabet
Returns:
true if it has or false otherwise

alphabets

public static Iterator alphabets()
Get an iterator over all alphabets known.

Returns:
an Iterator over Alphabet objects

getGapSymbol

public static Symbol getGapSymbol()

Get the special `gap' Symbol.

The gap symbol is a Symbol that has an empty alphabet of matches. As such , ever alphabet contains gap, as there is no symbol that matches gap, so there is no case where an alphabet doesn't contain a symbol that matches gap.

Gap can be thought of as an empty sub-space within the space of all possible symbols. If you are working in a cross-product alphabet, you should chose whether to use gap to represent 'no symbol', or a basis symbol of the appropriate size built entirely of gaps to represent 'no symbol in each of the slots'. Perhaps this could be explained better.

Returns:
the system-wide symbol that represents a gap

getGapSymbol

public static Symbol getGapSymbol(List alphas)

Get the gap symbol appropriate to this list of alphabets.

The gap symbol with have the same shape a the alphabet list. It will be as long as the list, and if any of the alphabets in the list have a dimension greater than 1, it will also insert the appropriate gap there.

Parameters:
alphas - List of alphabets
Returns:
the appropriate gap symbol for the alphabet list

createSymbol

public static AtomicSymbol createSymbol(String name,
                                        Annotation annotation)

Generate a new AtomicSymbol instance with a name and Annotation.

Use this method if you wish to create an AtomicSymbol instance. Initially it will not be a member of any alphabet.

Parameters:
name - the String returned by getName()
annotation - the Annotation returned by getAnnotation()
Returns:
a new AtomicSymbol instance

createSymbol

public static AtomicSymbol createSymbol(String name)

Generate a new AtomicSymbol instance with a name and an Empty Annotation.

Use this method if you wish to create an AtomicSymbol instance. Initially it will not be a member of any alphabet.

Parameters:
name - the String returned by getName()
Returns:
a new AtomicSymbol instance

createSymbol

public static AtomicSymbol createSymbol(char token,
                                        String name,
                                        Annotation annotation)
Deprecated. Use the two-arg version of this method instead.

Generate a new AtomicSymbol instance with a token, name and Annotation.

Use this method if you wish to create an AtomicSymbol instance. Initially it will not be a member of any alphabet.

Parameters:
token - the Char token returned by getToken() (ignpred as of BioJava 1.2)
name - the String returned by getName()
annotation - the Annotation returned by getAnnotation()
Returns:
a new AtomicSymbol instance

createSymbol

public static Symbol createSymbol(char token,
                                  Annotation annotation,
                                  List symList,
                                  Alphabet alpha)
                           throws IllegalSymbolException
Deprecated. use the new version, without the token argument

Generates a new Symbol instance that represents the tuple of Symbols in symList.

This method is most useful for writing Alphabet implementations. It should not be invoked by casual users. Use alphabet.getSymbol(List) instead.

Parameters:
annotation - The annotation bundle for the symbol
token - the Symbol's token [ignored since 1.2]
symList - a list of Symbol objects
alpha - the Alphabet that this Symbol will reside in
Returns:
a Symbol that encapsulates that List
Throws:
IllegalSymbolException - If the Symbol cannot be made

createSymbol

public static Symbol createSymbol(Annotation annotation,
                                  List symList,
                                  Alphabet alpha)
                           throws IllegalSymbolException

Generates a new Symbol instance that represents the tuple of Symbols in symList. This will attempt to return the same symbol for the same list.

This method is most useful for writing Alphabet implementations. It should not be invoked by casual users. Use alphabet.getSymbol(List) instead.

Parameters:
annotation - The annotation bundle for the Symbol
symList - a list of Symbol objects
alpha - the Alphabet that this Symbol will reside in
Returns:
a Symbol that encapsulates that List
Throws:
IllegalSymbolException - If the Symbol cannot be made

createSymbol

public static Symbol createSymbol(char token,
                                  Annotation annotation,
                                  Set symSet,
                                  Alphabet alpha)
                           throws IllegalSymbolException
Deprecated. use the three-arg version of this method instead.

Generates a new Symbol instance that represents the tuple of Symbols in symList.

This method is most useful for writing Alphabet implementations. It should not be invoked by users. Use alphabet.getSymbol(Set) instead.

Parameters:
token - the Symbol's token [ignored since 1.2]
annotation - the Symbol's Annotation
symSet - a Set of Symbol objects
alpha - the Alphabet that this Symbol will reside in
Returns:
a Symbol that encapsulates that List
Throws:
IllegalSymbolException - If the Symbol cannot be made

createSymbol

public static Symbol createSymbol(Annotation annotation,
                                  Set symSet,
                                  Alphabet alpha)
                           throws IllegalSymbolException

Generates a new Symbol instance that represents the tuple of Symbols in symList.

This method is most useful for writing Alphabet implementations. It should not be invoked by users. Use alphabet.getSymbol(Set) instead.

Parameters:
annotation - the Symbol's Annotation
symSet - a Set of Symbol objects
alpha - the Alphabet that this Symbol will reside in
Returns:
a Symbol that encapsulates that List
Throws:
IllegalSymbolException - If the Symbol cannot be made

generateCrossProductAlphaFromName

public static Alphabet generateCrossProductAlphaFromName(String name)
Generates a new CrossProductAlphabet from the give name.

Parameters:
name - the name to parse
Returns:
the associated Alphabet

getCrossProductAlphabet

public static Alphabet getCrossProductAlphabet(List aList)

Retrieve a CrossProductAlphabet instance over the alphabets in aList.

If all of the alphabets in aList implements FiniteAlphabet then the method will return a FiniteAlphabet. Otherwise, it returns a non-finite alphabet.

If you call this method twice with a list containing the same alphabets, it will return the same alphabet. This promotes the re-use of alphabets and helps to maintain the 'flyweight' principal for finite alphabet symbols.

The resulting alphabet cpa will be retrievable via AlphabetManager.alphabetForName(cpa.getName())

Parameters:
aList - a list of Alphabet objects
Returns:
a CrossProductAlphabet that is over the alphabets in aList

getCrossProductAlphabet

public static Alphabet getCrossProductAlphabet(List aList,
                                               String name)
                                        throws IllegalAlphabetException
Attempts to create a cross product alphabet and register it under a name.

Parameters:
aList - A list of alphabets
name - The name which the new alphabet will be registered under.
Returns:
The CrossProductAlphabet
Throws:
IllegalAlphabetException - If the Alphabet cannot be made or a different alphabet is already registed under this name.

getCrossProductAlphabet

public static Alphabet getCrossProductAlphabet(List aList,
                                               Alphabet parent)

Retrieve a CrossProductAlphabet instance over the alphabets in aList.

This method is most usefull for implementors of cross-product alphabets, allowing them to safely build the matches alphabets for ambiguity symbols.

If all of the alphabets in aList implements FiniteAlphabet then the method will return a FiniteAlphabet. Otherwise, it returns a non-finite alphabet.

If you call this method twice with a list containing the same alphabets, it will return the same alphabet. This promotes the re-use of alphabets and helps to maintain the 'flyweight' principal for finite alphabet symbols.

The resulting alphabet cpa will be retrievable via AlphabetManager.alphabetForName(cpa.getName())

Parameters:
aList - a list of Alphabet objects
parent - a parent alphabet
Returns:
a CrossProductAlphabet that is over the alphabets in aList

factorize

public static List factorize(Alphabet alpha,
                             Set symSet)
                      throws IllegalSymbolException

Return a list of BasisSymbol instances that uniquely sum up all AtomicSymbol instances in symSet. If the symbol can't be represented by a single list of BasisSymbol instances, return null.

This method is most useful for implementers of Alphabet and Symbol. It probably should not be invoked by users.

Parameters:
symSet - the Set of AtomicSymbol instances
alpha - the Alphabet instance that the Symbols are from
Returns:
a List of BasisSymbols
Throws:
IllegalSymbolException - In practice it should not. If it does it probably indicates a subtle bug somewhere in AlphabetManager

loadAlphabets

public static void loadAlphabets(InputSource is)
                          throws SAXException,
                                 IOException,
                                 BioException
Load additional Alphabets, defined in XML format, into the AlphabetManager's registry. These can the be retrieved by calling alphabetForName.

Parameters:
is - an InputSource encapsulating the document to be parsed
Throws:
IOException - if there is an error accessing the stream
SAXException - if there is an error while parsing the document
BioException - if a problem occurs when creating the new Alphabets.
Since:
1.3

getAlphabetIndex

public static AlphabetIndex getAlphabetIndex(FiniteAlphabet alpha)
Get an indexer for a specified alphabet.

Parameters:
alpha - The alphabet to index
Returns:
an AlphabetIndex instance
Since:
1.1

getAlphabetIndex

public static AlphabetIndex getAlphabetIndex(Symbol[] syms)
                                      throws IllegalSymbolException,
                                             BioException
Get an indexer for an array of symbols.

Parameters:
syms - the Symbols to index in that order
Returns:
an AlphabetIndex instance
Throws:
IllegalSymbolException
BioException
Since:
1.1