Course: MSc Cyber Security Host: Bristol Mathematics Lecturer: Dr Daniel Lawson

Data Science Toolbox

Coursebook index (By type) Timetable Home

Jump to Block: (About) 01 02 03 04 05 06 07 08 09 10 11 12

Students on the MSc Mathematics of Cybersecurity have access to bluecrystal phase 4. The process is as follows:

In Data Science Toolbox, we will be using this primarily for: * Large Compute Jobs; * GPU (Graphics Processing Unit) jobs, specifically for learning Neural Networks.

Project details:

Getting started

See my HPC notes on Github for code.

There are a couple of gotchas:

Additional thoughts on the HPC

Bluecrystal Keras and Tensorflow

  1. To get a version of anaconda that works with Tensorflow on BC4:
    module load languages/anaconda2/5.3.1.tensorflow-1.12
    

    You can add this to your .bashrc file so that this is always loaded for you.

  2. To install tensorflow and all dependencies, we need to make a conda environment for it. Note that you need to do these commands separately as some require interactive confirmation.
    conda init ## Required to make conda happy on the nodes
    conda create -y -n tf-env
    conda activate tf-env
    conda install tensorflow keras ipython pandas scikit-learn
     ## NB By default there is no interactive python!
      ## You can install anything else and it will be placed in the appropriate place by conda
    
  3. You will then need to write a script that will complete your desired task. However, note that bluecrystal phase 4 is required to run Tensorflow GPU jobs.
    • You can do this interactively by using srun -I as noted in my HPC notes; see the GPU Jobs documentation. The appropriate command is srun --nodes=1 --ntasks-per-node=16 --time=60:00:00 --pty bash -i to request an interactive session with 16 cores for 60 hours (test with one core for one hour: srun --nodes=1 --ntasks-per-node=1 --time=01:00:00 --pty bash -i). In my interactive session, the following got things working:
      conda init ## Required to make conda happy on the nodes
      source ~/.bashrc ## Required to load what conda init just did
      conda activate tf-env ## Gets us into our GPU environment
      ipython3
      
    • I was then able to run ipython interactively on the compute node:
      from keras.models import Sequential
      from keras.layers import Dense
      import numpy as np
      np.random.seed(7)
      import requests
      url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv?raw=true'
      r = requests.get(url, allow_redirects=True)
      open('pima-indians-diabetes.data.csv', 'wb').write(r.content)
      dataset = np.loadtxt("pima-indians-diabetes.data.csv", delimiter=",")
      X = dataset[:,0:8]
      Y = dataset[:,8]
        # create model
      model = Sequential()
      model.add(Dense(12, input_dim=8, activation='relu'))
      model.add(Dense(8, activation='relu'))
      model.add(Dense(1, activation='sigmoid'))
      model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
      model.fit(X, Y, epochs=150, batch_size=10)
      
    • Remember that to exit this environment you use conda deactivate.
    • You can even configure jupyter notebook to allow you to access it remotely, but this is non-trivial.
  4. Some further thoughts on conda:
    • If you followed the instructions above, the environment content was placed in ~/.conda/envs/tf-env. You can set this manually.
    • We can ensure that we all get the same environment by creating a file that describes it completely.
      conda env export > tf-env.yml
      
    • This can be passed into conda create using conda create -f tf-env.yml
    • You can easily run the provided python script as a job, which is the recommended way to get large runs done.