Assessments are month-long group projects, allowing a deep-delve into a specific area of Data Science applied to Cyber Security. They make up a total of 50% of the course mark (best 4 of 5) and are the single most important part of the course.
Undertaking a group project online is a difficult process that requires care and planning. Help for planning your project is given in Block 1, and includes:
- 1.3.3 Workshop Lecture on Assessments, listed completely in Block 01.
- The Example Assessment, which you should go over carefully.
- Appendix 1: Preparation List.
- Appendix 3: Replicability, which explains how to make your project run reliably on others’ computers.
- Appendix 4: GitHub, which explains how to use GitHub.
- Appendix 5: Bluecrystal, which is our High Performance Computing Infrastructure, essential for later Assessments.
- The Equity Formula for redistributing marks where a different equity is agreed.
The individual assessment instructions has significant guidance. This is extra thoughts that are less directly relevant but give context.
Comment on Markdown reflections:
The PDF versions of the example reflections are created using Pandoc and it is trivial:
pandoc -o RachelR_Reflection.pdf RachelR_Reflection.md
Markdown is an acceptable format, though PDF looks nicer. Referencing is important but don’t overdo it; you might use footers
[^ref1], or just place simple labels without worrying about Markdown format at all (label2).
[^ref1]: Lawson D, An Example Reference, 2020.
(label2): Lawson D, A Second Example Reference without Markup, 2020.
Comment on Report formats:
It is completely fine to present a well commented Rmd or ipynb file. You are welcome to try to generate a beautiful PDF in which all of the results are knitted together, but it can be awkward if content is fundamentally separated. Yes, you can create a PDF from each file and merge the PDF, and doing so once is educational, but it isn’t the point of DST.
Please commit your final output. It is generally considered bad practice to commit transient content to your repository. This would include the Jupyter Notebook with all of the content competed, and the html output of Rmd. However, for the purposes of generating a one-off assessed report, it is safest to do this, though best only for your final commit.
This is because it is possible that I cannot run your code, for a good reason or a bad, and therefore I want to see what the output should be.
Why is transient content bad? You repository will get bigger and take longer to process as the whole history of everything that you’ve generated is stored. Text files compress very nicely for this content, but binary objects such as images and data, hidden inside html or ipynb files, compress badly.
Comment on data:
Don’t commit very large datasets to GitHub, and don’t commit modestly large ones unless necessary (and try not to duplicate them). There are file size limits, but it is inefficient. Try to use a different data sharing solution, such as OneDrive, for such data.