{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Random Forests (Part 2; python models)\n", "\n", "```\n", "date: \"Block 06\"\n", "author: \"Daniel Lawson\"\n", "email: dan.lawson@bristol.ac.uk\n", "output: html_document\n", "version: 1.0.1\n", "```\n", "\n", "Here we get a random forest classifier running on the kddcup data. We start by importing data from R, after the standard boiler plate stuff." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Activity 1: Read in the data that we saved from R. \n", "\n", "This requires telling python that the first column is the \"index column\" (like row names in R). We use the function pd.read_csv." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "r_train=pd.read_csv('https://raw.githubusercontent.com/dsbristol/dst/master/data/conndataC_train.csv',index_col=0) \n", "r_test=pd.read_csv('https://raw.githubusercontent.com/dsbristol/dst/master/data/conndataC_test.csv',index_col=0) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also need the output of the Random Forest that was run in R (**block06-TreesAndForests_Part1.Rmd**).\n", "\n", "You should really save this locally, but for convenience I've added it to the github repo." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "r_rf_roc=pd.read_csv('https://raw.githubusercontent.com/dsbristol/dst/master/data/conndataC_RFroc.csv',index_col=0) # EDIT" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | duration | \n", "orig_bytes | \n", "resp_bytes | \n", "orig_ip_bytes | \n", "resp_ip_bytes | \n", "http | \n", "
---|---|---|---|---|---|---|
64203 | \n", "0.173953 | \n", "5.723585 | \n", "6.198479 | \n", "6.352629 | \n", "6.486161 | \n", "0 | \n", "
208055 | \n", "0.029559 | \n", "5.894403 | \n", "8.194229 | \n", "6.599870 | \n", "8.278936 | \n", "0 | \n", "
72988 | \n", "0.058269 | \n", "6.232448 | \n", "9.003808 | \n", "6.734592 | \n", "9.053219 | \n", "0 | \n", "
222960 | \n", "0.779325 | \n", "6.848005 | \n", "7.561122 | \n", "7.239933 | \n", "7.803843 | \n", "1 | \n", "
71198 | \n", "0.019803 | \n", "6.202536 | \n", "9.005896 | \n", "6.716595 | \n", "9.050524 | \n", "0 | \n", "
\n", " | duration | \n", "protocol_type | \n", "service | \n", "flag | \n", "src_bytes | \n", "dst_bytes | \n", "land | \n", "wrong_fragment | \n", "urgent | \n", "hot | \n", "... | \n", "dst_host_srv_count | \n", "dst_host_same_srv_rate | \n", "dst_host_diff_srv_rate | \n", "dst_host_same_src_port_rate | \n", "dst_host_srv_diff_host_rate | \n", "dst_host_serror_rate | \n", "dst_host_srv_serror_rate | \n", "dst_host_rerror_rate | \n", "dst_host_srv_rerror_rate | \n", "normal | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0 | \n", "tcp | \n", "http | \n", "SF | \n", "181 | \n", "5450 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "9 | \n", "1.0 | \n", "0.0 | \n", "0.11 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "normal. | \n", "
1 | \n", "0 | \n", "tcp | \n", "http | \n", "SF | \n", "239 | \n", "486 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "19 | \n", "1.0 | \n", "0.0 | \n", "0.05 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "normal. | \n", "
2 | \n", "0 | \n", "tcp | \n", "http | \n", "SF | \n", "235 | \n", "1337 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "29 | \n", "1.0 | \n", "0.0 | \n", "0.03 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "normal. | \n", "
3 | \n", "0 | \n", "tcp | \n", "http | \n", "SF | \n", "219 | \n", "1337 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "39 | \n", "1.0 | \n", "0.0 | \n", "0.03 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "normal. | \n", "
4 | \n", "0 | \n", "tcp | \n", "http | \n", "SF | \n", "217 | \n", "2032 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "49 | \n", "1.0 | \n", "0.0 | \n", "0.02 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "normal. | \n", "
5 rows × 42 columns
\n", "