Cluster news

 

24.04.2018 - The chuck cluster is starting.

The electrical works and physical reconfiguration of the cluster racks is finished. We start the new cluster - CHUCK.
The schedule is following:

  • on 25.04.2018, afternoon /work/chuck starts
    • it should be an exact copy of /work/psk - please verify your data; remember that /work/psk will not be back,
    • files open during the shutdown (in jobs or on desktops) can be damaged/incomplete,
    • links were not modified - if you have links to /work/psk they will not work.
  • on 26.04.2018 before noon, the chuck (frontend) will be opened for users and the queueing system will start working (old psk nodes will be gradually migrated to chuck)
  • on 27.04.2018 all nodes should be available
  • later on additional software will be installed (PGI compilers, CUDA, MPI implementations compiled with devtoolsets, ...)

 

18.03.2018 - The new cluster is ready for user testing.

The recent hardware purchases allow for a major upgrade of the computational cluster at CAMK. We bought complete new core cluster infrastructure which enabled us to build a new cluster while keeping the old PSK operational. Now it's mostly finished and ready for user testing. The name is chuck :)

Mojor changes, features:

  • new high performance, cluster filesystem: /work/chuck (do not use it for now!)
    • 550 TB space (and >200 mln files) + backup
    • ~5 GB/s agregated throughput
    • beegfs instead of lustre - should be more responsive and faster, especially with small files
  • new queueing system - SLURM (replaces Torque+Moab)
    • free, open source, used wiedely in PL-Grid
  • 16 new nodes with 320 cpu cores in total
  • 100 Gb/s Infiniband network in the new nodes
  • Scientific Linux 7 (PSK was based on SL6)

The cluster can be already used but only for testing (expect reconfigurations, reboots). The goal is to migrate the psk resources into chuck and make it ready for production before the 'big network shutdown' on April 13th. The rough roadmap:

  • finish chuck configuration and copy /work/psk to /work/chuck before 30 March
  • testing, software installation and documentation :)
  • switch off psk including /work/psk for final data synchronization (1-2 days without access to data)
  • migrate psk nodes to chuck
  • open chuck for production...