Prune

After residues have been selected, one may wish to remove some residues if they do not fit a given criterion. The functions below allow one allow to do so.

All functions are availible in both C++ and Python. In Python, the function names should be prefixed with prune_ instead of the namespace resolution.

Provided functions

namespace lemon::prune

Prune selected residues by removing them based on a criterion.

Functions

template<typename Container>
Container identical_residues(const chemfiles::Frame &frame, Container &residue_ids)

Remove residues which are biologic copies of one another in a crystal

Many crystal structures in the PDB contain two identical copies of a biological macromolecule. Since these copies are functionally identical, some users wishing to only analyze a unique set of protein chains may want to remove the identical residue copy. This function performs this operation on a given set of residue ids by comparing all the residues in the frame’s biological assemblies. If a residue in one assembly has the same ID as a residue in a different assembly, then the copied residue is removed.

Parameters
  • [in] frame: The frame containing residues of interest.

  • [inout] residue_ids: The residue IDs to be pruned

template<typename Container>
Container cofactors(const chemfiles::Frame &frame, Container &residue_ids, const ResidueNameSet &rns)

Remove residues which are typically present in many crystal structures

There are a common set of cofactors present in many crystal structures such as sugars and fatty acids used to induce crystallization. As a result, some users may remove these cofactors as they may match other criteria (such as being a small molecule) set by the user.

Parameters
  • [in] frame: The frame containing residues of interest.

  • [inout] residue_ids: The residue IDs to be pruned.

  • [in] rns: The residue names that one wishes to remove from residue_ids.

template<typename Container1, typename Container2 = Container1>
Container1 interactions(const chemfiles::Frame &frame, Container1 &residue_ids, const Container2 &interaction_ids, double distance_cutoff = DEFAULT_DISTANCE, bool keep = true)
template<typename Container1, typename Container2 = Container1>
Container1 keep_interactions(const chemfiles::Frame &frame, Container1 &residue_ids, const Container2 &interaction_ids, double distance_cutoff = DEFAULT_DISTANCE)

Remove residues which do not interact with a given set of other residues

This function is designed to remove residues which do not have a desired interaction with the surrounding protein environment. For example, if a user is interested in small molecules that interact with a Heme group, they can use this function to remove all residues that do have this interaction.

Parameters
  • [in] frame: The frame containing residues of interest.

  • [inout] residue_ids: The residue IDs to be pruned.

  • [in] interaction_ids: The residue ids that the users wishes the residue_ids to interact with.

  • [in] distance_cutoff: The distance that the residue_ids must be within a checked residue to be included.

template<typename Container1, typename Container2 = Container1>
Container1 remove_interactions(const chemfiles::Frame &frame, Container1 &residue_ids, const Container2 &interaction_ids, double distance_cutoff = DEFAULT_DISTANCE)

Remove residues which do interact with a given set of other residues

This function is designed to remove residues which have a undesirable interaction with the surrounding protein environment. For example, if a user is interested in small molecules that do not interact with water, they can use this function to remove all residues that interact with water.

Parameters
  • [in] frame: The frame containing residues of interest.

  • [inout] residue_ids: The residue IDs to be pruned.

  • [in] interaction_ids: The residue ids that the users wishes the residue_ids to not interact with.

  • [in] distance_cutoff: The distance that the residue_ids must be within a checked residue to be removed.

template<typename Container1, typename Container2 = Container1>
Container1 intersection(Container1 &residue_ids, const Container2 &intersection_ids)

Turns residue_ids in to intersection between it and intersection_ids

This function is designed to keep residues which have a desirable intersection with another set of residue ids.

Parameters
  • [inout] residue_ids: The residue IDs to be pruned.

  • [in] intersection_ids: The residue ids that the users wishes the residue_ids to also be in.

template<typename Container>
Container has_property(const chemfiles::Frame &frame, Container &residue_ids, const std::string &property_name, const chemfiles::Property &property)

Keeps residues with a given property

This function is designed to keep residues which have a desirable property

Parameters
  • [in] frame: The frame containing residues of interest.

  • [inout] residue_ids: The residue IDs to be pruned.

  • [in] property_name: The name of the property to keep

  • [in] property: The property that the residues must have to be kept

Variables

auto constexpr DEFAULT_DISTANCE = 6.0

The default distance used for pruning.

Example

C++

The following example demonstrates how to remove cofactors and other ‘common’ residues from a selection.

auto worker = [](const chemfiles::Frame& entry,
                 const std::string& pdbid) -> std::string {

    // Selection phase
    auto smallm = lemon::select::small_molecules(entry);
    if (smallm.empty()) {
        return std::string("");
    }

    // Pruning phase
    lemon::prune::identical_residues(entry, smallm);
    lemon::prune::cofactors(entry, smallm, lemon::common_cofactors);
    lemon::prune::cofactors(entry, smallm, lemon::common_fatty_acids);

    // Output phase
    return pdbid + lemon::count::print_residue_names(entry, smallm);
};

This example extends the previous one to show how one can find only the small- molecules interacting with a Heme group.

auto worker = [distance](const chemfiles::Frame& entry,
                         const std::string& pdbid) -> std::string {

    // Selection phase
    auto hemegs = lemon::select::specific_residues(
        entry, {"HEM", "HEA", "HEB", "HEC"});
    auto smallm = lemon::select::small_molecules(entry);

    // Pruning phase
    lemon::prune::identical_residues(entry, smallm);
    lemon::prune::cofactors(entry, smallm, lemon::common_cofactors);
    lemon::prune::cofactors(entry, smallm, lemon::common_fatty_acids);

    lemon::prune::keep_interactions(entry, smallm, hemegs, distance);

    // Output phase
    return pdbid + lemon::count::print_residue_names(entry, smallm);
};

Python

These examples are availible in python as:

import lemon

class MyWorkflow(lemon.Workflow):
    def worker(self, entry, pdbid):
        import lemon
        smallm = lemon.select_small_molecules(entry, lemon.small_molecule_types, 10)

        # Pruning phase
        lemon.prune_identical_residues(entry, smallm)
        lemon.prune_cofactors(entry, smallm, lemon.common_cofactors)
        lemon.prune_cofactors(entry, smallm, lemon.common_fatty_acids)

        # Output phase
        return pdbid + lemon.count_print_residue_names(entry, smallm)
import lemon

class MyWorkflow(lemon.Workflow):
    def worker(self, entry, pdbid):
        import lemon
        heme_names = set()
        heme_names.add(lemon.ResidueName("HEM"))
        heme_names.add(lemon.ResidueName("HEA"))
        heme_names.add(lemon.ResidueName("HEB"))
        heme_names.add(lemon.ResidueName("HEC"))

        hemegs = lemon.select_specific_residues(entry, heme_names)
        smallm = lemon.select_small_molecules(entry, lemon.small_molecule_types, 10)

        # Pruning phase
        lemon.prune_identical_residues(entry, smallm)
        lemon.prune_cofactors(entry, smallm, lemon.common_cofactors)
        lemon.prune_cofactors(entry, smallm, lemon.common_fatty_acids)

        lemon.keep_interactions(entry, smallm, hemegs, 6.0)

        # Output phase
        return pdbid + lemon.count_print_residue_names(entry, smallm) + '\n'

    def finalize(self):
        pass