Launch a Lemon workflow¶
Since Lemon is a header-only library, users of the C++ API have complete control over how they wish to launch their workflows. The lemon::Options class allows users to read command line options in a consistant manner. See the previous section for a more indepth discussion of this class. The following function is used to launch a lemon workflow.
launch(const Options &o, Function &&worker, Collector &collect)¶
Launch a Lemon workflow.
This function reads Lemon options and passes them to the appropriate
run_parallelfunction. This is the main entry point of a C++ Lemon program.
0 on success or a non-zero integer on error.
[in] o: An instance of the
Optionsused to pass arguments to Lemon
worker: Function object representing the body of the workflow.
collect: Function object for collect the results of
Submitting Lemon jobs¶
If a Lemon workflow does not require any custom features (IE custom options passed to the main function from the shell), users can use the script called launch_lemon.pbs to launch their workflow on PBS-based submission systems. An example of how to use this given below for a sample workflow.
wget -N https://mmtf.rcsb.org/v1.0/hadoopfiles/full.tar qsub launch_lemon.pbs -v LEMON_PROG=protein_angle
Note: The provided script will not work on all systems and may need to be edited to work in a given computing environment.
Python Workflows using lemon_python¶
The program lemon_python is secretly a Lemon workflow! This workflow searches for a class called MyWorkflow (which must subclass lemon.Workflow. It then instantiates this class and runs the member function Workflow.worker for every entry in the PDB. The member function Workflow.finalize is called when the workflow completes. An example of using this workflow is given below:
wget -N https://mmtf.rcsb.org/v1.0/hadoopfiles/full.tar tar xf full.tar lemon_python -p tmscore.py -w full/
Note: The entire python script passed to lemon_python is evaluated before the MyWorkflow class is instantiated. Control is not passed back to the script after the Worker.finalize method is called.
Using the Python Interpreter¶
For users of the candiy_lemon PyPI package, one must submit the derived Lemon workflow manually using the lemon.launch function. An example is given below. The arguments to this function are the lemon.Workflow daughter class, the path to the RCSB Hadoop files and the number of cores to use.
from candiy_lemon.lemon import * class MyWorkflow(Workflow): def worker(self, entry, pdbid): return entry.topology().residue(1).get("chainname").get().as_string() + '\n' def finalize(self): pass work = MyWorkflow() launch(work, "full", 2)
Prefiltering the PDB with searches originating on RCSB¶
The advanced search features on the RCSB website can be used to prefliter the PDB. First, one performs a search on the website. Then, they must obtain the query details by clicking the blue button on the bottom left corner of the search results. This will result in an XML version of their search being generated. This must be given as input to the script obtain_entries_from_search.pl provided with Lemon which will produce an entry file that can be passed to a workflow. An example XML result and entry file example are given below:
<orgPdbQuery> <version>head</version> <queryType>org.pdb.query.simple.AdvancedKeywordQuery</queryType> <description>Text Search for: hiv</description> <queryId>AC7A2BC3</queryId> <resultCount>2875</resultCount> <runtimeStart>2019-01-28T00:51:01Z</runtimeStart> <runtimeMilliseconds>1686</runtimeMilliseconds> <keywords>HIV</keywords> </orgPdbQuery>
perl obtain_entries_from_search.pl hiv_search.xml > hiv_prots.lst tar xf full.tar ./small_molecules -w full -e hiv_prots.lst
Danger Zone: Internal documentation!¶
These functions are internal to Lemon and not meant to be used as part of the external API. They are documented here for the interested reader and future Lemon developers.
run_parallel(Function &&worker, const std::string &p, Collector &collector, size_t ncpu = 1, const Entries &entries = Entries(), const Entries &skip_entries = Entries())¶
run_parallelfunction launches jobs which do return data.
Use this function to run the
workershould accept two arguments, a
std::string. It must return a value as this value will be appended, using the
combinefunction object, the the
collector. See the
Lemon Workflowdocumention for more details.
worker: A function object (C++11 lambda, struct the with operator() overloaded, or std::function object) that the user wishes to apply.
[in] p: A path to the Hadoop sequence file directory. by the worker object which are appended with
collector: A function object that handles the output of
[in] ncpu: The number of threads to use.
[in] entries: Which entries to use. Not used if blank.
[in] skip_entries: Which entries to skip. Not used if blank.
Hadoopclass is used to read input sequence files.
Hadoopclass using a
This Hadoop constructor takes a binary stream as input. This stream must be open and contain data from a sequence file obtained from RCSB.
Returns if a sequence file has remaining MMTF records in it.
Use this function to check if the sequence file has any remaining MMTF records stored in it.
True if another MMTF record is present. False otherwise.
Returns the next MMTF file.
This function reads the next MMTF record from the underlying stream. Be-warned that this function does minimal error checking and should only be used if has_next() has returned
A pair of
std::vector<char>s. The first member contains the PDB ID and the second contains the GZ compressed MMTF file.
Public Static Attributes
The size of the starting header.
read_hadoop_dir(const std::string &p)¶
Read a directory containing hadoop sequence files.