Select

Most Lemon workflows start by selecting one or multiple sets of residues and performing operations on these residues. The functions below are availible in both C++ and Python, but there are a few implementation differences users should note.

First, the C++ version of all functions are implemented as templates, allowing for the user to use their prefered container for storing the resulting residues. Second, each function has two overloads. One takes a container initialized by the user as an argument and subsequently populates it using the container’s insert method. No template arguments are required as these can be deduced at compile-time. An other overload initializes and populates a new container specified by a template argument (default is a std::list<size_t>). The choice of correct container is left to the user.

In Python, both overloads are availible as well. However, due to restrictions imposed by Python generics, the user must use the ResidueIDs container. All Python functions have been prepended with select_.

Provided selectors

namespace lemon::select

Functions to select various residue based on a given criterion.

Functions

template<typename Container = std::vector<uint64_t>>
Container small_molecules(const chemfiles::Frame &frame, const std::unordered_set<std::string> &types = small_molecule_types, size_t min_heavy_atoms = 10)

Select small molecules in a given frame

Use this function to find small molecules in a given frame. A small molecule is defined as an entity that has a given chemical composition. Also, the selected entity must have a specified number of atoms (default 10), so that common residues such as water and metal ions are not selected.

Return

The selected residue locations

Parameters
  • [in] frame: The entry containing molecules of interest.

  • [in] types: A set of std::string containing the accepted chemical chemical composition. Defaults are NON-POLYMER, OTHER, PEPTIDE-LIKE

  • [in] min_heavy_atoms: The minimum number of non-hydrogen atoms for a residue to be classed as a small molecule.

template<typename Container = std::vector<uint64_t>>
Container metal_ions(const chemfiles::Frame &frame)

Select metal ions in a given frame

This function populates the residue IDs of metal ions. We define a metal ion as a residue with a single, positively charged ion.

Return

The selected residue locations

Parameters
  • [in] frame: The entry containing metal ions of interest.

template<typename Container = std::vector<uint64_t>>
Container nucleic_acids(const chemfiles::Frame &frame)

Select nucleic acid residues in a given frame

This function populates the residue IDs of nucleic acid residues. We define a nucleic acid as a residue with a chemical composition containing the RNA or DNA substring.

Return

The selected residue locations

Parameters
  • [in] frame: The entry containing nucleic acid residues.

template<typename Container = std::vector<uint64_t>>
Container peptides(const chemfiles::Frame &frame)

Select peptide residues in a given frame

This function populates the residue IDs of peptide residues. We define a peptided as a residue with a chemical composition containing the PEPTIDE substring which is not PEPTIDE-LIKE.

Return

The selected residue locations

Parameters
  • [in] frame: The entry containing peptide residues.

template<typename Container = std::vector<uint64_t>>
Container residue_ids(const chemfiles::Frame &frame, const std::set<uint64_t> &resis)

Select residues with a given name in a given frame

This function populates the residue IDs of peptides matching a given name set.

Return

The selected residue locations

Parameters
  • [in] frame: The entry containing residues of interest.

  • [in] resnis: The set of residue IDs of interest.

template<typename Container = std::vector<uint64_t>>
Container specific_residues(const chemfiles::Frame &frame, const ResidueNameSet &resnames)

Select residues with a given name in a given frame

This function returns a set of residue locations within a given name set

Return

The selected residue locations

Parameters
  • [in] frame: The entry containing residues of interest.

  • [in] resnames: The set of residue names of interest.

template<typename Container = std::vector<uint64_t>>
Container residue_property(const chemfiles::Frame &frame, const std::string &property_name, const chemfiles::Property &property)

Select residues with a property

This function returns the residue locations of residues with a property

Return

The selected residue locations

Parameters
  • [in] frame: The entry containing residues of interest.

  • [in] property_name: The name of the property to select

  • [in] property: the property of interest

Example

C++

auto worker = [](const chemfiles::Frame& entry,
                 const std::string& pdbid) -> std::string {

    // Selection phase
    auto metal_ids = lemon::select::metal_ions(entry);

    // No pruning, straight to out output phase
    return pdbid + lemon::count::print_residue_names(entry, metal_ids);
};

auto collector = lemon::print_combine(std::cout);

Python

import lemon

distance = 6.0

class MyWorkflow(lemon.Workflow):
    def worker(self, entry, pdbid):
        import lemon
        # Selection phase
        metals = lemon.select_metal_ions(entry)

        # Output phase
        return pdbid + lemon.count_print_residue_names(entry, metals) + '\n'
    def finalize(self):
        pass