Files

Ai4r::Clusterers::KMeans

The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, with k < n.

More about K Means algorithm: en.wikipedia.org/wiki/K-means_algorithm

Attributes

centroids[R]
clusters[R]
data_set[R]
iterations[R]
number_of_clusters[R]

Public Class Methods

new() click to toggle source
# File lib/ai4r/clusterers/k_means.rb, line 39
def initialize
  @distance_function = nil
  @max_iterations = nil
  @old_centroids = nil
  @centroid_function = lambda do |data_sets| 
    data_sets.collect{ |data_set| data_set.get_mean_or_mode}
  end
end

Public Instance Methods

build(data_set, number_of_clusters) click to toggle source

Build a new clusterer, using data examples found in data_set. Items will be clustered in "number_of_clusters" different clusters.

# File lib/ai4r/clusterers/k_means.rb, line 52
def build(data_set, number_of_clusters)
  @data_set = data_set
  @number_of_clusters = number_of_clusters
  @iterations = 0
  
  calc_initial_centroids
  while(not stop_criteria_met)
    calculate_membership_clusters
    recompute_centroids
  end
  
  return self
end
distance(a, b) click to toggle source

This function calculates the distance between 2 different instances. By default, it returns the euclidean distance to the power of 2. You can provide a more convinient distance implementation:

1- Overwriting this method

2- Providing a closure to the :distance_function parameter

# File lib/ai4r/clusterers/k_means.rb, line 81
def distance(a, b)
  return @distance_function.call(a, b) if @distance_function
  return euclidean_distance(a, b)
end
eval(data_item) click to toggle source

Classifies the given data item, returning the cluster index it belongs to (0-based).

# File lib/ai4r/clusterers/k_means.rb, line 68
def eval(data_item)
  get_min_index(@centroids.collect {|centroid| 
      distance(data_item, centroid)})
end

Protected Instance Methods

calc_initial_centroids() click to toggle source
# File lib/ai4r/clusterers/k_means.rb, line 88
def calc_initial_centroids
  @centroids = []
  tried_indexes = []
  while @centroids.length < @number_of_clusters && 
      tried_indexes.length < @data_set.data_items.length
    random_index = rand(@data_set.data_items.length)
    if !tried_indexes.include?(random_index)
      tried_indexes << random_index
      if !@centroids.include? @data_set.data_items[random_index] 
        @centroids << @data_set.data_items[random_index] 
      end
    end
  end
  @number_of_clusters = @centroids.length
end
calculate_membership_clusters() click to toggle source
# File lib/ai4r/clusterers/k_means.rb, line 109
def calculate_membership_clusters
  @clusters = Array.new(@number_of_clusters) do 
    Ai4r::Data::DataSet.new :data_labels => @data_set.data_labels
  end
  @data_set.data_items.each do |data_item|
    @clusters[eval(data_item)] << data_item
  end
end
recompute_centroids() click to toggle source
# File lib/ai4r/clusterers/k_means.rb, line 118
def recompute_centroids
  @old_centroids = @centroids
  @iterations += 1
  @centroids = @centroid_function.call(@clusters) 
end
stop_criteria_met() click to toggle source
# File lib/ai4r/clusterers/k_means.rb, line 104
def stop_criteria_met
  @old_centroids == @centroids || 
    (@max_iterations && (@max_iterations <= @iterations))
end

[Validate]

Generated with the Darkfish Rdoc Generator 2.