Jump to Block: (About) 01 02 03 04 05 06 07 08 09 10 11 12 (Assessments)
01 Introduction
Welcome to Data Science Toolbox!
This week we will prepare you for your Data Science Journey.
Overview:
In Block 01, we cover:
- What is Data Science Toolbox?
- Use of Group Assessments.
- What is Data Science?
- An overview of Exploratory Data Analysis (EDA).
- Exploratory Data Analysis with R.
- Setting up a basic Data Science Environment with Rstudio
- NB: We cover Python starting Block 06.
- Using Git (via GitHub Desktop) for collaborative projects.
Lectures:
- 01.1-Intro
- 01.2-EDA
- Reference R code (NB: See 1.3.1 for explanation)
Workshop:
In the workshop, we will be focussing on remote collaboration and working. This section uses the command line - see the github appendix for details and additional tutorials.
Videos are optional, but make sure you understand why we are discussing elementary content.
- 1.3.1 - Workshop Lecture for RStudio (29.05)
- Rmd for 1.3.1 Introduction to R Studio
- See Appendix 1: Installing and working with Rstudio
- 1.3.2 - Workshop Lecture for Exploratory Data Analysis (18.13):
- Rmd for 1.3.2 Exploratory Data Analysis R markdown
- HTML for 1.3.2 Exploratory Data Analysis R markdown
- (Warning: video uses a slightly out-of-date Rmd document.)
- 1.3.3 Workshop Lecture on Assessments
- See Appendix 3: Installing and working with Github Desktop
- Split into the following parts:
- 1.3.3.1 - GitHub Workshop Intro (4:33)
- 1.3.3.2 - GitHub Workshop Git Setup (8:10)
- 1.3.3.3 - GitHub Workshop Git Repositories for Projects (21:32)
- 1.3.3.4 is discussing the Assignment, which we cover in class.
- (Example Assessment link)
Assessments (Formative):
- Portfolio 01 long form worksheet.
- Block01 on Noteable via Blackboard:
- Go to the Data Science Toolbox blackboard page
- Select Noteable
- Select R with stan as your “personal notebook server” and press “Start”
- Go to “Assignments”
- Select Block01 and press “Fetch”
- Click Block01> which opens up a drop down containing the only assignment, Block01. Select this.
- The assignment opens in Jupyter. Complete the worksheet. When you are done, save and return to the Assignments tab. Press validate, and when it is successful, press submit.
Assessments:
- The Example Assessment should be carefully examined.
- Assessment 0 will be set in this week; see Assessments. This is a formative assessment (i.e. does not contribute to your grade) and will be due in Week 3.
- Portfolio 0 will be set in this week; see Assessments. This is a formative assessment (i.e. does not contribute to your grade) and will be due in Week 3.
References:
The main references are:
- Appendix 1 Rstudio
- Appendix 2 Replicability
- Appendix 3 GitHub
- The Beginning Git tutorial for Command Line Git.
- R for Data Science by Hadley Wickham and Garrett Grolemund
- For an overview about how Data Science fits into other disciplines, see my mathcareers article on What to know before studying data science.
Data Sources:
This section used Cyber Security Data Sources:
- Bro log data from Secrepo
- This can be loaded into R in a nice form with a script (raw) that can be run directly from R using
source https://raw.githubusercontent.com/dsbristol/dst/master/code/loadconndata.R
. - The KDD99 dataset, which was created for a competition with a task specification. We normally use the 10% and column names files, which you can download directly.
Worksheets (unassessed)
Navigation:
Previous: About the Course (Block 00). Next: Block 02.