Skip to content

Proofread #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 19 additions & 8 deletions notebooks/1_table_oriented.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -134,14 +134,25 @@
"source": [
"A `DataFrame` is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table or the `data.frame` in R. \n",
"\n",
"- The table has 3 columns, each of them with a column label. The column labels are respectively `Name`, `Age` and `Sex`.\n",
"- The table above has 3 columns, each of them with a column label. The column labels are `Name`, `Age` and `Sex`, respectively.\n",
"- The column `Name` consists of textual data with each value a string, the column `Age` are numbers and the column `Sex` is textual data.\n",
"\n",
"In spreadsheet software, the table representation of our data would look very similar:\n",
"\n",
"![](../schemas/01_table_spreadsheet.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-info\">\n",
" \n",
"__Note__: You probably do not want to manually input the data of a DataFrame! In most situations, data stored in a file format are the starting point of an analysis. We will get to that later!\n",
"\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -199,7 +210,7 @@
"source": [
"<div class=\"alert alert-info\">\n",
" \n",
"If you are familiar to Python :ref:`dictionaries <python:tut-dictionaries>`, the selection of a single column is very similar to selection of dictionary values based on the key.\n",
"If you are familiar to Python :ref:`dictionaries <python:tut-dictionaries>`, the selection of a single column is very similar to the selection of dictionary values based on the key.\n",
"\n",
"</div>"
]
Expand Down Expand Up @@ -287,7 +298,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Or to the `Series`:"
"Or on the `Series`:"
]
},
{
Expand All @@ -314,7 +325,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"As illustrated by the `max()` method, you can _do_ things with a `DataFrame` or `Series`. Pandas provides a lot of functionalities each of them a _method_ you can apply to a `DataFrame` or `Series`. As methods are functions, do not forget to use parentheses `()`."
"As illustrated by the `max()` method, you can _do_ things with a `DataFrame` or `Series`. Pandas provides a lot of functionality for working with `DataFrame` or `Series`, often defined as methods on those objects. As methods are functions, do not forget to use parentheses `()`."
]
},
{
Expand Down Expand Up @@ -415,7 +426,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The `describe` method provides quick overview of the numerical data in a `DataFrame`. As the `Name` and `Sex` columns are textual data, these are by default not taken into account by the `describe` method. Many pandas operations return a `DataFrame` or a `Series`. The `describe` method is an example of a pandas operation returning a pandas `Series`.\n",
"The `describe` method provides quick overview of the numerical data in a `DataFrame`. As the `Name` and `Sex` columns are textual data, these are by default not taken into account by the `describe` method. Many pandas operations return a `DataFrame` or a `Series`. The `describe` method is an example of a pandas operation returning a pandas `DataFrame`.\n",
"\n",
"\n",
"__To user guide:__ check more options on `describe` :ref:`basics.describe`"
Expand All @@ -438,10 +449,10 @@
"source": [
"## REMEMBER\n",
"\n",
"- Import the package, aka `import Pandas as pd`\n",
"- Import the package, aka `import pandas as pd`\n",
"- A table of data is stored as a pandas `DataFrame`\n",
"- Each column in a `DataFrame` is a `Series`\n",
"- You can do things by applying a method to a `DataFrame` or `Series`"
"- You can do things by calling a method on a `DataFrame` or `Series`"
]
},
{
Expand Down Expand Up @@ -472,5 +483,5 @@
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}
14 changes: 7 additions & 7 deletions notebooks/2_read_write.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,14 @@
" \n",
"This tutorial uses the titanic data set, stored as CSV. The data consists of the following data columns:\n",
"\n",
"- PassengerId: Id of every passenger.\n",
"- Survived: This feature have value 0 and 1. 0 for not survived and 1 for survived.\n",
"- PassengerId: ID of every passenger.\n",
"- Survived: This feature has value 0 and 1. 0 for not survived and 1 for survived.\n",
"- Pclass: There are 3 classes: Class 1, Class 2 and Class 3.\n",
"- Name: Name of passenger.\n",
"- Sex: Gender of passenger.\n",
"- Age: Age of passenger.\n",
"- SibSp: Indication that passenger have siblings and spouse.\n",
"- Parch: Whether a passenger is alone or have family.\n",
"- Parch: Whether a passenger is alone or has family.\n",
"- Ticket: Ticket number of passenger.\n",
"- Fare: Indicating the fare.\n",
"- Cabin: The cabin of passenger.\n",
Expand Down Expand Up @@ -561,7 +561,7 @@
"source": [
"<div class=\"alert alert-info\">\n",
" \n",
"__Note__: Interested in the last N rows instead? Pandas also provides a `tail` method. For example, `titanic.tail(10)` will return the last 10 rows of the DataFrame.\n",
"__Note__: Interested in the last N rows instead? Pandas also provides a `tail()` method. For example, `titanic.tail(10)` will return the last 10 rows of the DataFrame.\n",
"\n",
"</div>"
]
Expand All @@ -570,7 +570,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"A check on how Pandas interpreted each of the column data types can be done by requesting the Pandas `dtypes` attribute:"
"A check on how Pandas interpreted each of the column data types can be done by requesting the `dtypes` attribute:"
]
},
{
Expand Down Expand Up @@ -643,7 +643,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Whereas `read_*` fucntions are used to read data to Pandas, the `to_*` methods are used to store data. The `to_excel` method stores the data as an excel file. In the example here, the `sheet_name` is named _passengers_ instead of the default _Sheet1_. By setting `index=False` the row index labels are not saved in the spreadsheet."
"Whereas `read_*` functions are used to read data to Pandas, the `to_*` methods are used to store data. The `to_excel` method stores the data as an excel file. In the example here, the `sheet_name` is named _passengers_ instead of the default _Sheet1_. By setting `index=False` the row index labels are not saved in the spreadsheet."
]
},
{
Expand Down Expand Up @@ -908,5 +908,5 @@
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}
13 changes: 9 additions & 4 deletions notebooks/3_subset_data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"`shape` is an attribute (remember [previous tutorial](./2_read_write.ipynb), no parantheses for attributes) of a pandas `Series` and `DataFrame` containing the number of rows and columns: _(nrows, ncolumns)_. A pandas Series is 1-dimensional and only the number of rows is returned."
"`shape` is an attribute (remember [previous tutorial](./2_read_write.ipynb), no parentheses for attributes) of a pandas `Series` and `DataFrame` containing the number of rows and columns: _(nrows, ncolumns)_. A pandas Series is 1-dimensional and only the number of rows is returned."
]
},
{
Expand Down Expand Up @@ -389,7 +389,12 @@
"\n",
"<div class=\"alert alert-info\">\n",
" \n",
"__Note:__ The inner square brackets define a :ref:`Python list <python:tut-morelists>` with column names, whereas the outer brackets are used to select the data from a pandas `DataFrame` as seen in the previous example.\n",
"__Note:__ The inner square brackets define a :ref:`Python list <python:tut-morelists>` with column names, whereas the outer brackets are used to select the data from a pandas `DataFrame`. The previous example can therefore also be written as:\n",
"\n",
"```python\n",
"columns_to_select = [\"Age\", \"Sex\"]\n",
"titanic[columns_to_select]\n",
"```\n",
"\n",
"</div>"
]
Expand Down Expand Up @@ -1020,7 +1025,7 @@
"source": [
"<div class=\"alert alert-info\">\n",
" \n",
"__Note:__ When combining multiple conditional statements, each condition must be surrounded by parentheses `()`. Moreover, you can not use `or`/`and` but need to use the `or` operator `|` and the `and` operator `&`.\n",
"__Note:__ When combining multiple conditional statements, each condition must be surrounded by parentheses `()`. Moreover, you can not use `or`/`and` but need to use the \"or\" operator `|` and the \"and\" operator `&`.\n",
"\n",
"</div>"
]
Expand Down Expand Up @@ -1674,5 +1679,5 @@
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}
2 changes: 1 addition & 1 deletion notebooks/4_plotting.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -493,5 +493,5 @@
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}