EnsembleGen: RNA Ensemble Structure Selection Based on NMR Data
EnsembleGen (also named "RESSD") is an efficient structure ensemble selection algorithm by integrating with residual dipolar coupling (RDC) data from nuclear magnetic resonance (NMR) experiments. This page includes instruction to the program. The whole package can be downloaded here. If you use this tool, our publication can be cited as: Chang AT, Chen L, Song L, Zhang S, Nikonowicz EP. Biochemistry. 2020 Sep 8;59(35):3225-3234. doi: 10.1021/acs.biochem.0c00369.
Pre-requirement:
1. There is no requirement for operating system to run EnsembleGen. However, you will need python interpretor to execute the program. Python 2.7 is recommended.
2. Please download and install relax-4.0.3 based on your platform from this website: https://www.nmr-relax.com/download.html#series_4.0.
Usage:
*** EnsembleGen ***
EnsembleGen [-c/--cofig <config file>]
[-o/--output <output file>]
[-n/--nproc <num cpu>]
[-r/--restart <restart file>]
[-h/--help]
e.g., EnsembleGen -c config
EnsembleGen -c config -o output -n 8 >> somefile
EnsembleGen -c config -r restartfile
Description:
This script aims to generate a reasonable structural ensemble given the residual dipolar coupling (RDC) restraints and a number of conformers (e.g, generated by MD simualtions). There are many ways of generating a structural ensemble using MD simulation snapshots (e.g., random selection, clustering analysis). However, it will be more accurate to combine with NMR restraints for ensemble generation, especially when one wants to use a fixed number of conformers (or states) to represent the dynamics over microseconds timelap.
EnsembleGen utilizes relax package (www.nmr-relax.com) to calculate the N-state RDC values and compare with the experimental values. In order to select the best N-state models from a large pool of conformers, clustering algorithm simualted annealing will be used. This clustering algorithm simulated annealing parallellizes N-state model selection, which has demonstrated a significant improvement for RMSD convergence.
Example Input Files:
[config file]
// note: The current implementation cannot deal with path with delimiters such as space/tab/comma. Please try to avoid them.
// n_steps_rst option is useful when combined with mutliprocessing
// n_steps_rst must be smaller than n_steps_ttl
e.g.,
name test // project name
pdb_dir selex_test // folder to pdb repository
rdc_file rdc_test.txt // rdc file
verbosity 0 // verbosity of the stdout
n_state 20 // number of model to select
n_mutation 1 // mutation for each simulated annealing step
n_steps_ttl 100 // total number of step
n_steps_rst 10 // restart from the best solution every several steps
T_start 1e-3 // starting temperature
T_max 500 // maximum temperature
T_end 1e-3 // ending temperature
[rdc file]
// column1 is atomname of spin 1 (: is residue Id)
// column2 is the atomname of spin 2 (@ is atom name)
// column3 is the rdc value in Hz
// please refer to relax mannual for more atom naming syntax
e.g., an rdc file containing two rdc values
:12@C6 :12@H6 13.7
:13@C1' :13@H1' -8.0
Options:
-h, --help show this help message and exit
-c CONFIGFILE, --config=CONFIGFILE
SA_relax config file (default = config)
-o OUTPUTFILE, --output=OUTPUTFILE
output file (default = sa_result)
-n NPROC, --nproc=NPROC
number of processors (default = 1)
-r RESTARTFILE, --restart=RESTARTFILE
restart file (default = __SA.RESTART__)