brokilon.ccd.domain.topology.ccd module

Module for the Conditional Clade Distribution implementation using maps for clades and clade splits.

brokilon.ccd.domain.topology.ccd.calc_entropy(m1, m2)[source]

For a given CCD via the maps m1 and m2, this calculates the entropy for it using the fromular from Lewis et al.

Parameters:
  • m1 – Count of clades

  • m2 – Count of clade splits

Returns:

Entropy as a float

brokilon.ccd.domain.topology.ccd.get_ccd_tree_bottom_up(m1, m2)[source]

From the maps of clade counts and clade split counts perform a dynamic program to calculate the CCD MAP tree.

Parameters:
  • m1 – Map for clade counts

  • m2 – Map for clade split counts

Returns:

the CCD MAP tree

brokilon.ccd.domain.topology.ccd.get_clades(tree: TreeNode) set[frozenset[str]][source]

Get all clades of a given tree

Parameters:

tree – an input tree

Returns:

set of clades

brokilon.ccd.domain.topology.ccd.get_maps(trees: list[TreeNode]) tuple[defaultdict[str, int], defaultdict[str, int], dict[int, list]][source]

From a list of trees, return relevant CCD maps from clades/clade splits to counts.

Parameters:

trees – list of input trees

Returns:

maps for CCDs, clades to occurrences (m1), clades to clade splits (m2), unique trees

brokilon.ccd.domain.topology.ccd.get_tree_from_list_of_splits(splits) str[source]

From a list of splits create the corresponding tree as a newick string

Parameters:

splits – list of splits

Returns:

newick string of tree

brokilon.ccd.domain.topology.ccd.get_tree_probability(tree, m1, m2, use_log=False)[source]

Calculate the probability of a tree given the occurrences of its clade and clade splits.

Parameters:
  • tree – input tree

  • m1 – CCD map for clades

  • m2 – CCD map for clade splits

  • use_log – Whether to use log transform for probabilities

Returns:

Probability of a tree

brokilon.ccd.domain.topology.ccd.sample_tree_from_ccd(m1, m2, n=1) list[TreeNode][source]

Given a CCD with m1 and m2, this function samples n trees proportional to their probabilities from this CCD.

Parameters:
  • m1 – Count of clades

  • m2 – Count of clade splits

  • n – number of trees to sample

Returns:

List of sampled trees