Competition details

Score Total: 100%
• HPC competition: 50%, including:
1. Performance: 35%
2. Interview: 15%
• AI competition: 50%, including:
1. Performance: 35%
2. Interview: 15%

The competition includes two parts:

Part 1 – Artificial Intelligence

TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs with a single API.

TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google’s Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well. For more information about TensorFlow, please visit: https://www.tensorflow.org/. The Deep Learning AI task will be an image recognition model training, using TensorFlow. The teams will be asked to demonstrate maximum deep learning accuracy.

 

GPU Cluster During Optimization
2 DGX1 nodes for distributed TensorFlow (Max. 16 GPUs per a job)
• Special rules:
o /home quota keeps 50GB
o ~/scratch as the working directory
o ~/project/PID (Quota: 10TB) as the sharing folder for data sets and examples
o Allocation period: 15/April ~ 31/August (2019)
o Accounts (incl. VPN access and login) will be expired on September 30, 2019. All the data under ~/scratch and ~/project will be removed on the date as well.

Competition Criteria
1) Competition software requirements:
Framework: TensorFlow
Model: ResNet-50
Dataset: ImageNet 2012

2) Training time: 90 minutes

3) Output:

Standardized output for epoch count, images/sec, accuracy and loss

4) Optimization direction:
Leverage a distributed-scale out solution across 16/32 GPUs using the NSCC Singapore cluster resource
o Tuning phase: 16 GPUs
o Final testing: 32 GPUs

Options:
o Native TensorFlow
o TensorFlow + Keras
o TensorFlow + Horovod

5) Judgement:
Presentation: 15 minutes
Q&A: 15 minutes

Baseline
• Tensorflow 1.12 + Horovod without any optimizations

 

Part 2 – High-Performance Computing

The HPC competition part will focus on the SWIFT application.  SWIFT is a hydrodynamics and gravity code for astrophysics and cosmology. It is an application designed for running on supercomputers that simulates forces upon matter due to two main things: gravity and hydrodynamics (forces that arise from fluids such as viscosity). SWIFT enables to run simulations of astrophysical objects, such as planets, galaxies, or even the whole universe. SWIFT is being used to test theories about what the universe is made of and how it evolved from the Big Bang up to the present day.

 

CPU Cluster:

32 CPU nodes for SWIFT

o   /home quota keeps 50GB

o   ~/scratch as the working directory

o   ~/project/PID (Quota: 10TB) as the sharing folder for data sets and examples

o   Allocation period: 15/April ~ 31/August (2019)

o   Accounts (incl. VPN access and login) will be expired on September 30, 2019. All the data under ~/scratch and ~/project will be removed on the date as well.

Baseline

Each SWIFT run will produce a timesteps_XXX.txt file, where ‘XXX’ is the number of cores used by the run (i.e., the total number of MPI tasks times the number of threads). The wall clock time of interest, which excludes initialization, is obtained by adding up the next to last column of that file and dividing the total by 1000 to obtain the wall clock time in seconds. This is easily done with a command such as the following:

awk 'BEGIN{tot=0} {tot += $11} END {print tot/1000}' < timesteps_XXX.txt

The performance, in steps/(wall clock hour), is then computed as 128*3600/(wall clock in s), in other words:

awk 'BEGIN{tot=0} {tot += $11} END {print 128*3600000/tot}' < timesteps_XXX.txt

We suggest you change the name of the ‘timesteps_XXX.txt’ to add a timestamp or some other means of identifying which of your runs produced it.  For each of your runs, save this file and the job standard output (with a name that suitably distinguishes it as well), as you may be required by the judges to provide the files for each run you include in the performance results you submit.

Instructions for SWIFT

SWIFT is an acronym for SPH WIth Fine-grained inter-dependent Tasking, where SPH means Smooth Particle Hydrodynamics. It is a gravity and SPH solver designed to run cosmological simulations. For the competition, we have settled on a specific commit known to work well with the target dataset.

  1. Get the Wed Mar 6 16:09:06 2019 commit of the code from the SWIFT GitLab site:git clone https://gitlab.cosma.dur.ac.uk/swift/swiftsim.git && cd swiftsim
    git checkout 3d44fb65ea39b9f7a2a99525f15c4cd464045c38
  2. Extract the files and change to the main directory. There you will find an INSTALL.swift file with instructions for building.
  3. Edit src/engine.c and examples/main.c to comment out the calls to engine_dump_snapshot() (two lines in the engine.c, one line in main.c). This will disable two large snapshot dumps (one at the beginning of the run and another at the end). This is done because, for the purposes of the competition, we are interested in the computational portion of the code, not in large file outputs it would otherwise perform.
  4. Before attempting to build SWIFT, you will need to have the following installed somewhere:
    • Compilers
    • MPI
    • HDF5 library (to read the large input file; version 1.8.x with x ≥ 17 is fine).
    • GSL (GNU Scientific Library, without which SWIFT will not be able to perform cosmological time integration).
    • FFTW (3.3.x, x ≥ 6 is fine).
    • Metis or ParMetis (to optimize load between MPI nodes).
  5. Configure and build SWIFT:
    • Run ./autogen.sh (only the first time)
    • Configure (we used ./configure - -with-metis - -with-tbbmalloc - -with‑fftw=/path/to/fftw; the - -with-tbbmalloc is recommended on Xeon-based clusters).
    • Build with make.
    • If make succeeds, there will be two binaries: examples/swift (for single-node runs) and examples/swift_mpi (for multi-node runs).
  6. Run SWIFT:
    • Change to the examples/EAGLE_low_z/EAGLE_50 subdirectory.
    • Get the initial conditions, a ~30 GB file named EAGLE_ICs_50.hdf5, by running ./getICs.sh (this will only need to be done once, of course).
    • Edit eagle_50.yml to change the value of dt_max from 1.e-2 to 1.e-5. This is done to increase the computational load and provide better scaling.
    • Run SWIFT. Note that SWIFT runs best with only a few MPI tasks per node. You should test a few different numbers of tasks per node such as 1, 2 and 4.
    • The threading model used by SWIFT is NOT OpenMP. You will need to tell it explicitly how many threads to use per MPI task, through the - -threads=N option, where N is the number of desired threads.
    • Your command line must include the –cosmology, - -hydro, - -self-gravity, and –stars options, all of which relate to the physics aspects of the simulation.
    • You should run exactly 128 time steps, which means that your command line must also include "-n 128".
      For example, a minimal command line to run SWIFT on eight nodes with two tasks per node and 16 threads per task would be:
      mpirun -np 16 ../../swift_mpi - -cosmology - -hydro - -self-gravity - -stars - -threads=16 -n 128 eagle_50.yml
    • There are other SWIFT options you may find useful. Run "examples/swift - -help" to find out what other options are available.
    • Run on 2, 4, 8, 16,... nodes.