Tag Archives: cluster

Torque batch queue system for mentat

I have installed the torque batch queue system on our 50 node (~300 core) mentat cluster. Here are some useful PBS commands that can be used with Torque.

qsub script
Submit a job script for execution.
qstat
Show status of running and pending jobs.
tracejob
Display historical information about your jobs.
qdel
Kill a job.
qhold
Hold a job.
qstat -Q
qstat -Qf

Show configuration of queues.

Peer-to-peer distributed Matlab computing – update

After discussing in detail with colleagues at the Donders and at the FIL, I have implemented the peer-to-peer distributed computing toolbox for MATLAB. Most of the desired functionality is now in place, and it seems to work robustly and efficiently.

The peer toolbox allows you to do something like this in MATLAB

a = randn(400,400);
tic; cellfun('pinv', {a, a, a, a, a}, 'UniformOutput', false); toc
tic; peercellfun('pinv', {a, a, a, a, a}, 'UniformOutput', false); toc

Continue reading

Peer-to-peer distributed Matlab computing

In a recent meeting with the SPM developers, we discussed parallel computing using the Matlab distributed computing toolbox, Star-P, Sun Grid Engine, and other batch systems that can be linked to Matlab. These are all limited in their usefulness for the typical neuroimaging research setting in that they are based on a centralized job distribution system. That may work fine on a large cluster with a centralized configuration and system administration, but even then the usefullness is limited because all input and output data (which are typically large) have to be send over the network twice: first to the job manager, then to the compute node (and vice versa for the results).

To resolve some of these problems, I came up with the idea of peer-to-peer distributed computing in Matlab. The full description can be found on http://fieldtrip.fcdonders.nl/development/peer

Mentat: parallel computing using the Matlab compiler

The Mentat toolbox is a collection of Matlab functions that enables you to perform parallel computations from within Matlab on a Beowulf-style cluster. The toolbox was developped on Linux with Matlab 6.5, but probably will also work on other platforms.

I have evaluated various open source parallel computing toolboxes for Matlab, but found that none of them was suitable for my specific needs. Therefore I decided to implement one myself…

The most important problem that I faced is that the parallel computations are performed in separate Matlab sessions. That means that each node in the cluster has to be running it’s own Matlab session, which requires a Matlab license for each node. Furthermore, when using specialized Matlab toolboxes in the computation (e.g., signal processing, image processing, optimization, statistics), also a separate license is required for each of these toolboxes on every node.

Mathworks recently released their commercial distributed computing toolbox. I have no experience with it, but it appears to me that my license problem still would not be solved with that toolbox.

The goal of the Mentat toolbox is:

  • evaluate Matlab code, not low-level c-code
  • work from within the Matlab environment, i.e., normal users should be able to use it
  • the Matlab code should be “unaware” of it being evaluated in parallel

Furthermore, I made use of the following restrictions when designing the toolbox:

  • the computational problem (in our case data processing) should be seperable in chunks
  • each chunk is evaluated in a separate job, independently from all other chunks
  • the chuncks should be computationally large enoug to justify the overhead of sending the data over the network

Since I want my computations to simply scale with the number of available cluster nodes, without me having to buy additional licenses, I implemented a solution based on the Matlab compiler toolbox. Let me give an example: Assume that you are running an interactive Matlab session on the master node of the cluster, then you can type something like

a = rand(1000,1000,30);
pfor(1:30, 'b(:,:,%d) = fft(a(:,:,%d))');

which is equivalent to executing

for i=1:30
b(:,:,i) = fft(a(:,:,i));
end

The pfor function is the main interface to my toolbox, and it takes care of the parallelization. What happens is that the fft function, or any other function in its place, is wrapped into a m-function that is compiled into a standalone executable. Subsequently, the data for each job is written to a network drive that is common to all nodes and all jobs are remotely executed on the cluster. There is also a peval function, which takes multiple strings as input and evaluates them in parallel.

The only requirements are that Matlab and the compiler toolbox should be present on the master node, that there should be a way to remotely start jobs (e.g. ssh/rsh), and there should be a common network disk.

The mentat toolbox is released under the GPL license, you can download it here: mentat_050418.tgz.

The toolbox is still in a very experimental stage, just as this webpage. I hope to develop it further and to improve the documentation, to make it more generally usable. Please contact me if you have any questions, remarks or suggestions.