Competition Details

The competition includes two parts:

Part I – Artificial Intelligence

Mission: The mission of Part 1 is to reach the highest distributed training performance. The AI framework options are:

  • TensorFlow over RDMA
  • Caffe2 over RDMA

The teams can choose their preferred framework from the two options. The RDMA implementation code information is listed on the Training Material page.

Model to Use: Inception v3, ResNet 152 and VGG16.

Benchmark: Distributed training based on Imagnet dataset:

  1. TensorFlow with Inception v3
  2. TensorFlow with ResNet 152
  3. TensorFlow with VGG16

Or

  1. Caffe2 with Inception v3
  2. Caffe2 with ResNet 152
  3. Caffe2 with VGG16

Test Criteria:

  1. Inception V3: Convergence time
  2. Inception V3, ResNet152, VGG16: images per second

Benchmark Baseline: The HPC-AI competition committee has tested the benchmark on a particular system to set baseline performance criteria. The participating teams are asked to run the benchmark to define a baseline on their own cluster and on the NSCC-available supercomputer. Details on how to access the NSCC supercomputer are provided on the Training Material page. The teams are challenged to achieve the highest performance.

Score: Total of 70 points

  1. Score: maximum of 35 points

    Performance improvement based on the HPC-AI competition committee results. The committee will re-test each team’s code on the same system that was used to set the initial baseline

    • Inception V3: 50% based on images/second improvement results and 50% from convergence time improvement results
    • ResNet152: 100% based on images/second improvement results
    • VGG16: 100% based on images/second improvement results
  2. Score: maximum of 15 points.

    Performance improvement based on the team’s own baseline (own cluster, or NSCC cluster).

    • Inception V3: 50% based on images/second improvement results and 50% from convergence time improvement results
    • ResNet152: 100% based on images/second improvement results
    • VGG16: 100% based on images/second improvement results

  3. Score: maximum of 20 points.

    The team’s presentation of their achievements and answering the judging committee questions.

Part II – High Performance Computing

The teams are asked to benchmark the Weather Research and Forecasting (WRF) Model. WRF is the next-generation mesoscale numerical weather prediction system, designed to serve both operational forecasting and atmospheric research needs. More info can be found at: https://www.mmm.ucar.edu/weather-research-and-forecasting-model

The code for WRF v3.9.1.1 can be obtained from: http://www2.mmm.ucar.edu/wrf/users/download/get_sources.html

The teams are asked to demonstrates the highest performance and scalability utilizing the NSCC cluster based on the standard WRF 2.5km CONUS benchmark for WRF (http://www2.mmm.ucar.edu/WG2bench/conus_2.5_v3/). Teams can use any MPI library of their choice (OpenMPI, MVAPICH, HPC-X etc.).

Teams should build WRF according to the instructions provided with the program using one of the ‘dmpar’ options of the configuration script.  The resulting configure.wrf file may be edited before compiling, to make any needed adjustments for the MPI version used.  To execute the program, the restart and boundary data files need to be downloaded and rebuilt according to steps 1 (a), (b), (d) and (e) of http://www2.mmm.ucar.edu/WG2bench/conus_2.5_v3/READ-ME.txt . The namelist.input file to be used is [here], since the namelist.input file mentioned in step 1(c) of the above READ-ME.txt file is not valid for recent versions of WRF.

The measure of performance will be the Simulation Speed, which is computed by means of the stats.awk script as follows:
% grep ’Timing for main’ rsl.error.0000 | tail -1439 | awk ’{print $9}’ | awk -f stats.awk

Score: total of 30 points:

  1. Score up to 20 points. Achieving highest performance at scale.
  2. Score: maximum of 10 points. The team’s presentation of their achievements and answering the judging committee questions.