Posts

Showing posts from 2016

Cool pandas hack - get random rows in a multi-column dataframe

#
# load inputs
#
actives = pd.read_pickle("actives_final.pkl")
decoys = pd.read_pickle("decoys_final.pkl")

#
# stack tables
#
df = pd.concat([actives, decoys])

#
# remove duplicate indicies
#
df = df.reset_index()


ordered = df.sort_values(by='tc')
    .groupby(['category', 'molId'])
    .last()
    .reset_index()

shuffled = df.sample(frac=1, random_state=123456)
    .groupby(['category', 'molId'])
     .last()
     .reset_index()

Example slurm cluster on your laptop (multiple VMs via vagrant)

To anybody who's considering a switch to modern queuing systems, like slurm, this will be an useful guide. Rather than doing a roll out on production nodes, we'll try things out in vagrant. This text assumes you have vagrant installed on your local machine (it's very easy).

The ultimate aim will be to setup 2 servers (worker nodes) and one master/controller, and run a simple job on them.

There are multiple great guides out there: most of them are out of date, and won't work without some tweaking on Ubuntu 16.04 Xenial.

The entirety of the code is hosted at
https://github.com/jandom/gromacs-slurm-openmpi-vagrant

The README.md file will show you how to set it up, with checks on every step.

References
https://mussolblog.wordpress.com/2013/07/17/setting-up-a-testing-slurm-cluster/
https://github.com/gabrieleiannetti/slurm_cluster_wiki/wiki/Installing-a-Slurm-Cluster
http://philipwfowler.me/2016/04/14/how-to-setup-a-gramble/
https://github.com/dakl/slurm-cluster-vagrant/blob/ma…

Making DESMOND easier to run on a cluster

So, some of you probably wanted to try out this DESMOND code from DESRES. It's shipped with the Maestro suite by Schrodinger, which is free for academics and a little annoying to use if you wanna *just* use the binary.

This tutorial essentially describes how to run DESMOND without any of the annoying wrappers shipped by Schrodinger. It's making the code possible to run in a sane fashion – similar to how you launch your any other MD codes on the cluster. We'll cover both the CPU DESMOND and the GPU GDESMOND.

Required package can be downloaded from https://www.deshawresearch.com/downloads/download_desmond.cgi/

Desmond_Maestro_2016.3.tar
Desmond-3.6.1.1.tar.gz

The 'Desmond-3.6.1.1.tar.gz' is just the source code and some sample systems - we won't be compiling that just gonna use the sample system to see if the code runs.

Install 'Desmond_Maestro_2016.3.tar' following their instructions.

Here is an example module file, to accompany the installion

Then t…

What was that password again?

A few months ago I bought a new hard-drive. When you encrypt a drive, you need to set a password – but the trick here is to remember the password. Your hard-drive won't send you a "forgot password?" email. So you set that password, enter it, the drive mounts and then you work away happily for weeks without rebooting your machine.

Then you reboot... and you need to re-enter that password.

You slowly realize that those weeks ago you set a new unique password for that bloody hard-drive that you don't remember for the love of god... What to do? Well one, don't be a complete idiot like me – remember the password. But if you do, here is what to do? [Spoiler alert: I failed to recover the password but it's a pretty interesting ride...]

The new Ubuntu 16.04 comes with this handy too called 'bruteforce-luks', here are some scenarios. Note you have to use it as root.

"I remember the start/end of the password and I sort of know what's in the middle"

Benchmarking gromacs – 2 quick questions

Image
With gromacs there are all these things you have to do when benchmarking, it's a little bit of a mess. One question that I always wondered about was – well – how long do you have to run to get a reliable performance number, in ns/day? 1e0 MD step is certainly too short but 1e7 steps seems like unnecessarily excessive.

Second question was, could one get away with measuring benchmarking speed of just a box of water? Currently people use system like DHPR or APOA1 (protein in water) to asses but those are arbitrary.  A box of water has dramatically simpler topology than a protein but maybe it doesn't matter?

One thing that this asses is absolute performance of gromacs: the ns/day below are low but that's because an old workstation is used, without GPU acceleration.


The answer are pretty simple: 1e4–1e5 steps are required to 'converge' the performance estimate, for my test system. Also,  a water box of very similar dimensions (and # of particles) but without the protein…

Pharma, you're not a good place to go

Just visited an old-favourite blog of mine – the excellent "In the pipeline" by Derek Lowe.

By going though the last 5 or so pages, it's clear what's the hiring state in the industry that once was

Merck Cuts Chemistry
Layoffs at Takeda
AstraZeneca Cutting Back Again
Layoffs at Medimmune

To balance that there was one positive news
Merck Expanding on the West Coast

The blog posts are for the span between July 2016 and Sept 2016.

Now, this is no hard numbers but gives me an "anecdotal nudge" that field/industry is in atrophy. When is the last time you saw any of the computer-science shops with similar headlines? "Google cuts down on software engineers", "Facebook re-organizes and lays-off data scientists",  anybody? It simply doesn't happen, not at this scale.

If you're a young student, grad or post grad in life sciences, consider your options carefully. Academia is marred with fierce competition over small resources. There is a saying i…

VNC connect from Mac OSX El capitan to Ubuntu Linxu

Edit 2016-11-05

This is actually trivial, here are the steps.


On the remote Ubuntu "workstation", run
$ vncserverOn the home Mac machine, run your favorite SSH tunnel
$ ssh -t -L 5901:localhost:5901 workstationOpen up the vnc screen in a Mac window
$ open vnc://localhost:5901 

Testing modern Symfony apps

Introduction The words "modern" and "PHP framework" would usually be considered oxymorons – and for good reasons. However, here is no point in denying that a large number of things on the internet relies on Symfony or some other PHP framework to run. These things will be gradually phased out or upgraded but for now they can't just remain untested because they're somewhat (out)dated...

API testing The more rigorous part is RESTful API testing that relies on sending json to server and getting server back from the server. Let's create a boilerplate class that's going to help us send POST/GET/DELETE requests wrapped around as methods:


And now we're going to use that ancestor class to facilitate some real-life testing of user profile reads and updates.

To run the tests do the following

Frontend functional/flow testing While valuable, the API testing will not cover certain aspects of the application flow. For example you changed some <script> d…

Gromacs 5.x on TITAN Cray machine

For compilation instructions checkout https://groups.google.com/d/msg/plumed-users/Tx29XNNRq8o/xeAu7RNaBAAJ

For a while we've been preparing to run some simulations on this machine, hosted by Oak Ridge National Lab. Every cluster is a bit different and that's definitely true for TITAN: each box has 16 CPUs (arranged in 2 "numas") and 1 K20 nVidia GPU. There is no usual MPI or Infiniband, there is some other Cray-specific beast.

Here is an example submission script for a non-replica exchange simulation:

#!/bin/bash
module add gromacs/5.0.2
cd $PBS_O_WORKDIR # important - allow the GPU to be shared between MPI ranks export CRAY_CUDA_MPS=1
mpirun=`which aprun` application=`which mdrun_mpi`
options="-v -maxh 0.2 -s tpr/topol0.tpr "
gpu_id=000000000000 # only 12, discard last '0000'
$mpirun -n 32 -N 16 $application -gpu_id $gpu_id  $options
Submit with


$ qsub -l walltime=1:00:00 -l nodes=2 submit.sh
Requesting for 2 boxes, start 32 MPI processes/ranks, 16 per box.…

Symfony cumulative ACL isGranted

The default use of Access Control Lists in symfony can be a little awkward. That's because the $securityContext isGranted method performs a cumulative permissions check, while the typical implementations of ACL component isGranted perform a different kind of check. Here is what I mean, let's start with a simple security token.

$em = $this->getContainer()->get('doctrine')->getManager();
$aclManager = $this->getContainer()->get('myapp_user.acl_manager');

$repository = $em->getRepository('MyAppUserBundle:User');
$user = $repository->find(1);

$output->writeln(sprintf("user:%d %s", $user->getId(), $user->getUsername() ) );
$token = new UsernamePasswordToken($user, $user->getPassword(), "firewallname", $user->getRoles());
$securityContext = $this->getContainer()->get('security.token_storage');
$authorizationChecker = $this->getContainer()->get('security.authorization_checker'); 
$secur…

Force-field parametrization, multiple birds with multiple stones – with REST2

This blog post is a idea for a force-field parametrization strategy, which takes advantage of replica exchanges between different parametrizations of the same system.

One of the problems with force-field parametrization is that a large number of parameter space has to be searched. A series of parameter values is often evaluated to try and match an experimental observable. This evaluation is usually done sequentially, slowing down the process. Additionally, if the parameter choice introduces an energy barrier, it may not be possible to sample efficiently and obtain convergence of a numerical quantity, to be compared with experiment.

To enhance sampling, temperature replica exchange can be employed. Replicas of the same hamiltonian are run at different temperatures end exchanges are attempted according to the boltzman criterion. Importantly, the energies of the neighboring replicas are compared.

It should be advantageous, in the force-field parametrization to take advantage of this idea…

Modeller - quick start, filling loops in the DUDE protein receptor structures

The aim here was getting simulations of the DUDE receptor proteins up and running. The main challenge is a large number of missing sidechains and loops in these structures - this is not a problem for docking codes perhaps, but certainly is for MD simulation codes.

Modeller from the Sali lab seemed like the tool for the job, so let's grab that

wget https://salilab.org/modeller/9.16/modeller_9.16-1_amd64.deb
sudo env KEY_MODELLER=MODELIRANJE dpkg -i modeller_9.16-1_amd64.deb

The only useful reference I was able to find on filling loops and missing sidechains was:

https://salilab.org/modeller/wiki/Missing%20residues

However, it assumes the aligment.ali file is constructed by hand. Maybe that's pleasurable to some but when I have ~20 receptor structures to complete, there is no time for this. Let's automate how alignment.ali file is generated:

# alignment.py file
from MDAnalysis import Universe
from MDAnalysis.lib.util import convert_aa_code
import textwrap, glob

u = Universe(&quo…

React/Redux first contact - it's not that different at all, a pocket guide for ex-PHP dev

It's my first encounter with Redux (which is an implementation of Flux). Coming from a PHP/Symfony2 background, things appear pretty different at first. The core ideas of Redux are Actions, ActionCreator and the Store - all very different vocab from what the PHP-folks are used to.

However, things are not as different as they appear to be. Inspired by this video here is a pocket translator for what these things are in PHP-speak.

Actions
The closest match are Requests, they carry an action type (equivalent of a URI) and possibly some payload (Request post/get params)

ActionTypes
One-to-one match with 'Routes'

Store
Only one per application, it's a bit like a Routing component that gets requests, and basing on their URI types, assigns which Controller to send it to.

Reducers
Sort of like controllers, they recieve Actions from the Store and perform the following function

newState = reducer(oldState, action)
So it's all quite simple really!

M(e)LT build for SC2

This is an odd of a post for this blog, as it concerns as starcraft2 strategy rather than something related to my work. But hey, it's pretty cool.

The standard in LoV SC2 game for Zerg is a Roach/Ravager opener. It's incredibly strong: within 8min Zerg is at your base with maxed out supply army. So you gotta push before then.

Zerg is pretty weak initially building an economy. The build order is rougly

Supply
Refinery
Barracks
Refinery
Supply
Command Center
Factory with Reactor
Starport, move the Reactor, Factory builds a tech lab.

The double CC bit is very important: it will give you a strong economy and allow you to reinforce your push. Also build an engineering bay and some turrets along the way.

At around 4-5 minute mark, you should have ~10 marines, 1-2 tanks and 1-2 liberators. Head for the zerg base. Siege one liberator on the ramp to their base, one liberator above your army. Siege the tank up so that it benefits from the vision of the liberators. Start poking with mar…

gromacs and plumed2 with intel compilers

For all you performance junkies, no guarantees tho!

https://gist.github.com/jandom/4bd1610e1892c61fccb0