{ "metadata": { "name": "", "signature": "sha256:4b84ff0db6e80640371f7213daeeb61897b24c124dd7b303b8997006627af949" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Organizing your data\n", "\n", "Because spreadsheets are very flexable, people use \n", "them in very different ways and often organize their \n", "data in a form that is easy for humans to understand,\n", "but non-trivial for computers to parse. \n", "\n", "We will take two sheets from one spreadsheet (`stats_data.xlsx`)\n", "to illustrate this point (`dirty_sheet` and `clean_sheet`).\n", "The sheet we call `clean_sheet` is a reorganized form of the \n", "original data in `dirty_sheet`. If you open up this file in \n", "a spreadsheet program, you will note that the original user \n", "had re-organized this data to plot it, so the original form \n", "was not even a good choice for use by the spreadsheet program\n", "itself.\n", "\n", "The general rule to organizing data is to keep data of the same \n", "nature together. Usually, this means data with the same units \n", "and measured in the same way.\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas \n", "fn = '../../data/stats_data.xlsx'\n", "dirty_df = pandas.read_excel(fn, 'dirty_sheet')\n", "clean_df = pandas.read_excel(fn, 'clean_sheet')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "clean_df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", " | exposure_time_h | \n", "dose_ug | \n", "mouse | \n", "day | \n", "mouse_weight_g | \n", "lung_weight_g | \n", "
---|---|---|---|---|---|---|
0 | \n", "0 | \n", "0 | \n", "0 | \n", "22 | \n", "22.15 | \n", "0.78 | \n", "
1 | \n", "0 | \n", "0 | \n", "1 | \n", "25 | \n", "27.76 | \n", "0.70 | \n", "
2 | \n", "0 | \n", "0 | \n", "2 | \n", "22 | \n", "24.00 | \n", "0.97 | \n", "
3 | \n", "0 | \n", "0 | \n", "3 | \n", "24 | \n", "22.74 | \n", "1.32 | \n", "
4 | \n", "0 | \n", "0 | \n", "4 | \n", "22 | \n", "23.02 | \n", "1.08 | \n", "
5 | \n", "0 | \n", "0 | \n", "5 | \n", "25 | \n", "23.68 | \n", "1.27 | \n", "
6 | \n", "0 | \n", "0 | \n", "6 | \n", "25 | \n", "20.22 | \n", "0.74 | \n", "
7 | \n", "0 | \n", "0 | \n", "7 | \n", "22 | \n", "22.17 | \n", "1.25 | \n", "
8 | \n", "6 | \n", "200 | \n", "0 | \n", "25 | \n", "18.46 | \n", "1.12 | \n", "
9 | \n", "6 | \n", "200 | \n", "1 | \n", "25 | \n", "19.16 | \n", "0.88 | \n", "
10 | \n", "6 | \n", "200 | \n", "2 | \n", "25 | \n", "19.69 | \n", "1.04 | \n", "
11 | \n", "6 | \n", "200 | \n", "3 | \n", "25 | \n", "18.95 | \n", "0.97 | \n", "
12 | \n", "6 | \n", "200 | \n", "4 | \n", "25 | \n", "26.83 | \n", "0.85 | \n", "
13 | \n", "6 | \n", "200 | \n", "5 | \n", "25 | \n", "26.24 | \n", "0.81 | \n", "
14 | \n", "6 | \n", "200 | \n", "6 | \n", "25 | \n", "30.65 | \n", "0.86 | \n", "
15 | \n", "6 | \n", "200 | \n", "7 | \n", "24 | \n", "22.07 | \n", "1.14 | \n", "
16 | \n", "24 | \n", "200 | \n", "0 | \n", "25 | \n", "25.54 | \n", "1.19 | \n", "
17 | \n", "24 | \n", "200 | \n", "2 | \n", "25 | \n", "19.04 | \n", "0.15 | \n", "
18 | \n", "24 | \n", "200 | \n", "3 | \n", "25 | \n", "20.65 | \n", "1.15 | \n", "
19 | \n", "24 | \n", "200 | \n", "4 | \n", "25 | \n", "20.42 | \n", "0.99 | \n", "
20 | \n", "24 | \n", "200 | \n", "5 | \n", "24 | \n", "20.07 | \n", "1.26 | \n", "
21 | \n", "24 | \n", "200 | \n", "6 | \n", "24 | \n", "25.04 | \n", "1.29 | \n", "
22 | \n", "24 | \n", "200 | \n", "7 | \n", "25 | \n", "24.80 | \n", "1.07 | \n", "
23 | \n", "72 | \n", "200 | \n", "0 | \n", "25 | \n", "18.51 | \n", "1.31 | \n", "
24 | \n", "72 | \n", "200 | \n", "1 | \n", "24 | \n", "19.14 | \n", "1.15 | \n", "
25 | \n", "72 | \n", "200 | \n", "2 | \n", "24 | \n", "16.97 | \n", "1.02 | \n", "
26 | \n", "72 | \n", "200 | \n", "3 | \n", "23 | \n", "19.78 | \n", "1.57 | \n", "
27 | \n", "72 | \n", "200 | \n", "4 | \n", "25 | \n", "16.97 | \n", "1.42 | \n", "
28 | \n", "72 | \n", "200 | \n", "5 | \n", "25 | \n", "17.74 | \n", "1.31 | \n", "
29 | \n", "72 | \n", "200 | \n", "6 | \n", "25 | \n", "18.61 | \n", "1.33 | \n", "
30 | \n", "72 | \n", "200 | \n", "7 | \n", "24 | \n", "18.31 | \n", "1.30 | \n", "
31 | \n", "72 | \n", "400 | \n", "0 | \n", "25 | \n", "27.29 | \n", "0.57 | \n", "
32 | \n", "72 | \n", "400 | \n", "1 | \n", "25 | \n", "30.03 | \n", "1.09 | \n", "
33 | \n", "72 | \n", "400 | \n", "2 | \n", "25 | \n", "27.47 | \n", "0.35 | \n", "
34 | \n", "72 | \n", "400 | \n", "3 | \n", "25 | \n", "23.62 | \n", "0.80 | \n", "
35 | \n", "72 | \n", "400 | \n", "4 | \n", "25 | \n", "23.28 | \n", "0.38 | \n", "
36 | \n", "72 | \n", "400 | \n", "5 | \n", "25 | \n", "28.21 | \n", "0.91 | \n", "
37 | \n", "72 | \n", "400 | \n", "6 | \n", "25 | \n", "29.40 | \n", "0.25 | \n", "
38 | \n", "72 | \n", "400 | \n", "7 | \n", "25 | \n", "30.37 | \n", "0.18 | \n", "