Overview
ACCRE is the premier resource for the high-performance computing needs of research throughout Vanderbilt University. With over 600 multi-core systems in a 4,000 square foot facility, the ACCRE cluster is used for research in a wide variety of fields, including genetics research, particle physics, and astronomy.
An overview of what we support and don’t support at ACCRE. Many services that are unsupported at ACCRE are provided instead by the Vanderbilt Data Science Institute and VUIT.
What is a cluster?
To put it simply, a cluster is a bunch of computers that are networked together to perform intensive computing jobs. Users of the cluster write programming code that would normally take a long time to process, then schedule a job (or group of jobs) to run the program on the ACCRE cluster. This way, the program can run faster and have access to more memory, and users can get more done in less time.
Where is the ACCRE cluster?
The cluster is located in Hill Center on Peabody Campus, and looks like this:
You don’t need to be physically at the cluster to operate it – in fact, only ACCRE and certain VUIT staff can access the cluster, and we only go there to do occasional maintenance. Mainly, to access the cluster, you use a special program called a secure shell client, or SSH client, which will let you work on the cluster from anywhere.
Being in front of the SSH client is like being in front of the cluster. As you type, the keystrokes get sent to the cluster, and any results get sent back in real time as well. Because of the way it is designed, many ACCRE users can log in at once.
There are other tools that can be used to interact with the ACCRE cluster. You can transfer files back and forth using a Secure Copy (SCP) client, and run graphical programs using an X Window server or X11 server. In theory, you could run a web browser on ACCRE; it appears as though it’s running on the desktop, but it’s actually running on the gateway. (We don’t recommend doing this for everyday use, by the way. It’s very slow!)
The Firefox browser running through ACCRE, as it looks on an X Window server.
10 reasons to use the ACCRE cluster
Researchers at Vanderbilt run their computing jobs on ACCRE for a wide variety of reasons.
- Accessing larger processor or memory resources. ACCRE CPUs are large multi-core processors, while a single compute node can be equipped with as much as 512GB of RAM, which is ideal for big data processing.
- Getting more done in less time. The ACCRE environment is a shared resource, meaning researchers around campus purchase compute nodes that are then incorporated into the environment, but can then harvest idle cycles on other researchers’ hardware for performing many more computations than would be possible on a single machine.
- It’s reliable. You don’t have to worry about your computer turning off in the middle of your computation job or restarting because your computer “needs to update”. With very rare interruptions, ACCRE is available 24 hours a day, 7 days a week.
- All the software is installed. ACCRE comes with many software packages pre-installed, including Python, R, and MATLAB, as well as popular high-performance computing libraries like MPI and CUDA. We provide MATLAB free to use on the ACCRE cluster for all users, and for ACCRE users affiliated with the College of Arts and Science, we provide access to Scala as well.
- We’re here to help. You can open a help ticket anytime you have difficulties using the cluster so that you can focus more time on your research.
- It’s available anywhere. You can submit a job from your lab station and check on your results from your office.
- You can share files and results with others. Since ACCRE is a shared system, collaborating with others is easy.
- It’s reasonably priced. New research labs and collaborations can try ACCRE free as a guest account. Our group rates are subsidized by VU and VUMC, and are lower than many commercial cloud computing services.
- It’s backed up. Most data on the cluster is backed up nightly to a tape library.
- We can handle sensitive data. Our system is designed for proprietary and export controlled data as well as protected health information (PHI), research health information (RHI), and human genotype and phenotype data.
How does the ACCRE cluster work?
The ACCRE cluster has many different types of computers, all of which work together:
- The compute nodes perform the actual work. Jobs (i.e. instances of running programs) are executed on compute nodes. ACCRE consists of over 600 computing nodes.
- The gateways are what you see when you log in to ACCRE. These gateways are accessed interactively from a remote “shell” session using a Linux tool called ssh. Gateways allow users to submit jobs, edit files, and get the results from jobs. Some gateways are specific to a certain research group; these are called custom gateways.
- The job scheduler server takes jobs that have been submitted and assigns them to a particular compute node. The job scheduler software that ACCRE uses is called SLURM. SLURM tracks and manages compute and memory resources on the compute nodes, and decides when and where to run your job. SLURM will email you updates about your job.
- The file servers (L-Store and GPFS) store all the code and data so they can be used by all the different nodes and gateways. ACCRE uses a tool called GPFS to store users’ data. You can think of GPFS as Dropbox for the cluster. It’s available on each gateway and compute node, eliminating the need to copy data between the various compute resources on the cluster.
Some of the nodes contain NVIDIA graphics processing units, or GPUs. Traditionally GPUs were designed to power video games to perform calculations quickly. Because of the nature of their design, GPUs are being used more and more for non-graphics applications as well (e.g. for deep learning applications, molecular dynamics, image processing, and much more).
When was the ACCRE cluster set up?
ACCRE has existed in its current form since 2003, while predecessors to ACCRE have existed as early as 1994. You can read more about our history here.
Who runs ACCRE?
Prof. Paul Sheldon is the faculty director of ACCRE. Our other executive directors are Hunter Hagewood, our director of research computing operations, and Alan Tackett, our technical director.
The ACCRE cluster is maintained by roughly a dozen staff who report to a steering committee of three faculty: Paul Sheldon (chair, A&S), Doug Schmidt (Engineering/Associate Provost), and Brett Byram (Engineering). The steering committee in turn reports to the Vice Provost for Research at Vanderbilt, Padma Raghavan. ACCRE also maintains a faculty advisory committee for advice and recommendations. You can read more about our organizational structure here.
We operate independently from VUIT, although our offices are in the same building, and we collaborate with them on many projects.
What other services does ACCRE provide?
- Jupyter at ACCRE: We recently set up a Jupyter cluster for big data computing. This environment is well suited for applications involving large volumes of data (multi TB) that make use of mature and emerging tools and practices from data science.
- Open Grid Computing: ACCRE is a member of the Open Science Grid, which allows researchers to run complex computing jobs throughout the OSG consortium of over 100 universities and research labs.
- Tape Backup: We provide remote tape backup services to Vanderbilt departments and laboratories regardless of whether they use the cluster. Most data stored on the cluster are also backed up nightly to a tape library.
- LStore: We manage LStore, a logistical storage system that’s used for managing large amounts of data from projects such as the Vanderbilt TV News Archive and the CMS experiment for the Large Hadron Collider.
- REDDnet: Vanderbilt is a member of the Research and Education Data Depot Network (REDDnet, pronounced “ready net”). The REDDnet coalition facilitates data intensive collaboration among its member institutions by providing a facility for large, high-bandwidth distributed storage. In addition to the aforementioned CMS experiment, REDDnet is used for geospatial image storage as well as research in supernovas, structural biology and diabetic eye diseases.
- TNCRED Servers: The Tennessee Consortium on Research, Evaluation, and Development (TNCRED) uses state education data to understand the effects of class size, teacher compensation, low-performing school interventions, and other factors that may affect student performance. ACCRE maintains two servers for education researchers to perform statistical research on these data.