Jump to Block: (About) 01 02 03 04 05 06 07 08 09 10 11 12
01 Introduction
Welcome to Data Science Toolbox!
This week we will prepare you for your Data Science Journey. It is essential that you prepare before contact time. That means:
- Watch and reflect on the Lectures;
- Look at the worksheets and think about the questions; as a minimum, make notes on how you might go about answering them;
- Most importantly, look at the Workshop content and do the pre-preparation for it.
The first two blocks demand the most work to allow you to hit the ground running. There is less content in future blocks, with a corresponding increased amount of time for group assessments.
Overview:
In Block 01, we cover:
- What is Data Science Toolbox?
- Use of Group Assessments.
- What is Data Science?
- An overview of Exploratory Data Analysis (EDA).
- Exploratory Data Analysis with R.
- Setting up a basic Data Science Environment with Rstudio
- NB: We cover Python starting Block 06.
- Using Git (via GitHub Desktop) for collaborative projects.
- How to work with Cyber security data
Lectures:
- 1.0 - Introduction to the course (22.16)
- 1.1 - Introduction to Data Science (26.59)
- 1.2 - Exploratory Data Analysis (26.45)
- Slides
- Reference R code (NB: See 1.3.1 for explanation)
Worksheets:
Preparation:
- Everyone needs to have followed the Block 01 preparation given in Appendix 1.
- Specifically, you must have installed Rstudio and Github Desktop, and seen the appropriate training content.
- You cannot properly use the interaction time unless you have done this preparation in advance!
Workshop:
In the workshop, we will be discussing how to collaborate and work together remotely. We will then discuss Exploratory Data Analysis in practice. This section uses the command line - see the github appendix for details and additional tutorials.
- 1.3.1 - Workshop Lecture for RStudio (29.05)
- 1.3.2 - Workshop Lecture for Exploratory Data Analysis (18.13)
- 1.3.3 Workshop Lecture on Assessments, split into the following parts:
Workshop Activity:
Before the workshop, you will have attempted to understand the Workshop content. This workshop will discuss difficulties encountered during this content.
Assessments:
- The Example Assessment should be carefully examined.
- Assessment 0 will be set in this week; see Assessments. This is a formative assessment (i.e. does not contribute to your grade) and will be due in Week 3.
References:
The main references are:
- The GitHub Appendix
- The Beginning Git tutorial for Command Line Git.
Data Sources:
We use the following Cyber Security Data Sources:
- Bro log data from Secrepo
- This can be loaded into R in a nice form with a script (raw) that can be run directly from R using
source https://raw.githubusercontent.com/dsbristol/dst/master/code/loadconndata.R
. - The KDD99 dataset, which was created for a competition with a task specification. We normally use the 10% and column names files, which you can download directly.
Navigation
Previous: About the Course (Block 00). Next: Block 02.