Labeling a tree with transmission history

This module provides functions to label nodes in an Tree object based on their blockcounts, assigning transmission ancestry identifiers.

It traverses the tree and assigns labels to nodes where blockcounts indicate unknown transmission ancestry. Unlabeled nodes are tracked for further processing.

pyccd.label_transmission_history._label_all_nodes(tree: TreeNode, unlabeled_nodes_list: List = None, unknown_count: int = 0) Tuple[List, int][source]

Labels all nodes by iterating over the entire tree.

Parameters:
  • tree – The tree to label.

  • unlabeled_nodes_list – List of unlabeled nodes. If None, initializes to an empty list.

  • unknown_count – Number of labeled unknown nodes (unknown transmission ancestors).

Returns:

A tuple containing the updated list of unlabeled nodes and the updated unknown count.

pyccd.label_transmission_history._label_all_remaining_unknowns(unlabeled_nodes_list: List, unknown_count: int) None[source]

Labels all nodes in the provided list with the unknown transmission history IMPORTANT: This function assumes that the list of unlabeled nodes provided is sorted according to their level in the tree.

The function modifies the nodes in place and does not return any values. The unknown_count is updated as each node is labeled.

The method works as follows:

  • If a node is the root: it is labeled with the current unknown_count.

  • If a node’s parent (up) has a blockcount of -1: the node inherits the parent’s transmission ancestry.

  • If neither of the above conditions are true: the function checks the node’s sibling for a label.

  • If the sibling has a transmission ancestry: the node is labeled with the same identifier.

  • If no valid label is found: the node is labeled with a new “Unknown-{unknown_count}” label.

Parameters:
  • unlabeled_nodes_list – List of nodes to be labeled with transmission ancestry identifiers.

  • unknown_count – The current count of labeled “Unknown” nodes

pyccd.label_transmission_history._label_leaf_and_reachable_nodes(tree, unlabeled_nodes_list, top_infected_nodes_list)[source]

Labels all leaf nodes and recursively propagates the transmission ancestry to all reachable nodes.

This function processes all the leaves in the provided tree (tree). For each leaf node:

  • If the blockcount is -1 (indicating no transmission event), it labels the leaf with its name and propagates this label upwards and to the children, marking all reachable nodes with the same ancestry.

  • Creates a queue of top infected nodes that arise during the propagation

The function modifies the tree by adding the transmission ancestry feature to nodes and updates the unlabeled_nodes_list by removing nodes that have been labeled.

Parameters:
  • tree – The tree to be processed (should contain leaf nodes with blockcount).

  • unlabeled_nodes_list – List of nodes that have not yet been labeled with transmission ancestry.

  • top_infected_nodes_list – List of nodes that are considered top-infected nodes, which are infected by transmission propagation.

Returns:

The updated unlabeled_nodes_list and top_infected_nodes_list after propagation.

pyccd.label_transmission_history._label_top_infected_nodes(top_infected_nodes_list, unlabeled_nodes_list) list[TreeNode][source]

Labels transmission ancestry for top-infected nodes and their children.

This function iterates over the list of top-infected nodes and propagates the transmission ancestry down to their children. If a child node is unlabeled and has a blockcount of 0 or -1, it inherits the transmission ancestry from its parent (the current node). If the child has a blockcount of -1, it is added to the list of top-infected nodes to process further.

The function ensures that: - Each node in the top_infected_nodes_list is properly labeled. - Non-binary trees are not processed (only binary trees are supported). - Child nodes are removed from the unlabeled_nodes_list once labeled.

Parameters:
  • top_infected_nodes_list – List of nodes with transmission ancestry that need to propagate labels to their children.

  • unlabeled_nodes_list – List of nodes that need to be labeled with transmission ancestry.

Returns:

Updated unlabeled_nodes_list after labeling top-infected nodes’ children.

Raises:

AssertionError – If an unlabeled node is found or a non-binary tree is encountered.

pyccd.label_transmission_history.label_transmission_tree(tree)[source]

Labels the transmission history onto a Tree object based on blockcounts.

This function modifies the tree in place by assigning transmission ancestry labels to nodes where blockcounts indicate unknown transmission ancestry. It does not return any values.

Parameters:

tree – A Tree object with blockcounts annotated.