greedy_clustering

class GreedyAgglomerativeClustering(clusters: Sequence[sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering.Cluster], merge_candidate_determination_strategy: Optional[sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering.MergeCandidateDeterminationStrategy] = None)[source]

Bases: object

An implementation of greedy agglomerative clustering which avoids unnecessary recomputations of merge costs through the management of a priority queue of potential merges.

Greedy agglomerative clustering works as follows. Starting with an initial set of clusters (where each cluster typically contains a single data point), the method successively merges the two clusters where the merge cost is lowest (greedy), until no further merges are admissible. The merge operation is a mutating operation, i.e. the initial clusters are modified.

To apply the method, the Cluster class must be subclassed, so as to define what the cost of a merge in your application shall be and how two clusters can be merged. For example, if data points are points in a Cartesian coordinate system, then the merge cost can be defined as the minimum or maximum distance among all pairs of points in the two clusters, admissibility being determined by a threshold that must not be exceeded; the merge operation can simply concatenate lists of data points.

class Cluster

Bases: abc.ABC

Base class for clusters that can be merged via GreedyAgglomerativeClustering

abstract merge_cost(other) float

Computes the cost of merging the given cluster with this cluster

Returns

the (non-negative) merge cost or math.inf if a merge is inadmissible

abstract merge(other)

Merges the given cluster into this cluster”

Parameters

other – the cluster that is to be merged into this cluster

__init__(clusters: Sequence[sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering.Cluster], merge_candidate_determination_strategy: Optional[sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering.MergeCandidateDeterminationStrategy] = None)
Parameters

clusters – the initial clusters, which are to be agglomerated into larger clusters

apply_clustering() List[sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering.Cluster]

Applies greedy agglomerative clustering to the clusters given at construction, merging clusters until no further merges are admissible

Returns

the list of agglomerated clusters (subset of the original clusters, which may have had other clusters merged into them)

class WrappedCluster(cluster, idx, clusterer: sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering)

Bases: object

Wrapper for clusters which stores additional data required for clustering (internal use only)

__init__(cluster, idx, clusterer: sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering)
is_merged() bool
get_cluster_association() sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering.WrappedCluster

Gets the wrapped cluster that this cluster’s points have ultimately been merged into (which may be the cluster itself)

Returns

the wrapped cluster this cluster’s points are associated with

remove_merges()
compute_merges(initial: bool, merged_cluster_indices: Optional[Tuple[int, int]] = None)
class ClusterMerge(c1: sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering.WrappedCluster, c2: sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering.WrappedCluster, merge_cost)

Bases: object

Represents a potential merge

__init__(c1: sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering.WrappedCluster, c2: sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering.WrappedCluster, merge_cost)
apply()
class MergeCandidateDeterminationStrategy

Bases: abc.ABC

__init__()
set_clusterer(clusterer: sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering)

Initialises the clusterer the strategy is applied to :param clusterer: the clusterer

abstract iter_candidate_indices(wc: sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering.WrappedCluster, initial: bool, merged_cluster_indices: Optional[Tuple[int, int]] = None) Iterator[Union[int, sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering.ClusterMerge]]
Parameters
  • wc – the wrapped cluster: the cluster for which to determine the cluster indices that are to be considered for a potential merge

  • initial – whether we are computing the initial candidates (at the start of the clustering algorithm)

  • merged_cluster_indices – [for initial=False] the pair of cluster indices that were just joined to form the updated cluster wc

Returns

an iterator of cluster indices that should be evaluated as potential merge partners for wc (it may contain the index of wc, which will be ignored)

class MergeCandidateDeterminationStrategyDefault

Bases: sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering.MergeCandidateDeterminationStrategy

iter_candidate_indices(wc: sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering.WrappedCluster, initial: bool, merged_cluster_indices: Optional[Tuple[int, int]] = None) Iterator[Union[int, sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering.ClusterMerge]]
Parameters
  • wc – the wrapped cluster: the cluster for which to determine the cluster indices that are to be considered for a potential merge

  • initial – whether we are computing the initial candidates (at the start of the clustering algorithm)

  • merged_cluster_indices – [for initial=False] the pair of cluster indices that were just joined to form the updated cluster wc

Returns

an iterator of cluster indices that should be evaluated as potential merge partners for wc (it may contain the index of wc, which will be ignored)