The clustering method used in BioModule is named CAM (Clique Aggregation Method).
CAM is based on two properties of biological modules: connectivity and core-attachment structure.
The subgraph of a biologcial network induced by a module should be connected because the members of the
module work in tight cooperation; a module is connected if there exists a reliable path,
whose existence probabilities is greater than a "confidence score threshold",
for any two genes (or proteins) in the module.
On the otherhand according to Gavin et al. (2006), a complex consists of a core and an attachment.
Therefore a biological module should be a core-attachment structure.
CAM find biological modules by mergeing maximal reliable cliques. A reliable clique is a clique in which every weight of an edge is greater than the confidence score threshold, and a maximal reliable clique is a reliable clique that cannot be contained in any other reliable cliques. The pseudo code for CAM is shown as follows, and There are two major parts: Lines 4-47 depicts that how CAM generates candiated modules, and Lines 49-60 shows how CAM computes finial modules by removing overlapping modules. The measures that CAM used in lines 13 and 50 are clique score and module score, respectively. Given a clique C, the score of the clique is shown as following equation.
Given a module M, the score of the module is shown as following equation.
- Confidence score threshold:
When an input network is weighted and this threshold is increased,
the number of reliable paths is decreased and the modules shrink.
We recommend that use higher value for this threshold
if the network contains many false positives or false negatives.
CAM sets this threshold to 0.5 when user submits a job without setting it.
- Maximum module size:
This parameter sets the maximum size of the modules generated.
The suggested value for this parameter is 150, because the sizes of most of protein complexes are less than 150.
- Module overlapping threshold:
The threshold used to remove highly overlapped modules.
When the threshold is increased, the overlapping modules is also increased.
We think the safe choice for this parameter is 0.5.