Thanks! We'll be in touch in the next 12 hours
Oops! Something went wrong while submitting the form.

The Ultimate Beginner’s Guide to Jupyter Notebooks

Jupyter Notebooks offer a great way to write and iterate on your Python code. It is an incredibly powerful tool for interactively developing and presenting data science projects. Jupyter Notebook allows to showcase the code and its output at a single place helping combine narrative text, visualizations and other rich media.The intuitive workflow promotes iterative and rapid development, making notebooks an increasingly popular choice at the heart of contemporary data science, analysis, and increasingly science at large. Creating Jupyter Notebooks is completely free as it is a part of the open source Project Jupyter.

Project Jupyter is the successor to an earlier project IPython Notebook, which was first published as a prototype in 2010. Jupyter Notebook is built on top of iPython, an interactive tool for executing Python code in the terminal using REPL model(Read-Eval-Print-Loop). The iPython kernel executes the python code and communicates with the Jupyter Notebook front-end interface. Jupyter Notebooks also provides additional features like storing your code and output and keep the markdown by extending iPython.

Although Jupyter Notebooks support using various programming languages, we will focus on Python and its application in this article.

Getting Started with Jupyter Notebooks!

Installation

Prerequisites

As you would have surmised from the above abstract we need to have Python installed on your machine. Either Python 2.7 or Python 3.+ will do.

Install Using Anaconda

The easiest way for a beginner to get started with Jupyter Notebooks is by installing it using Anaconda. Anaconda installs both Python3 and Jupyter and also includes quite a lot of packages commonly used in the data science and machine learning community. You can follow the latest guidelines from here.

Install Using Pip

If, for some reason, you decide not to use Anaconda, then you can install Jupyter manually using Python pip package, just follow the below code:

CODE: https://gist.github.com/ORION6194/1f021bdddeecf8974e95f545f106e32e.js

Launching First Notebook

Open your terminal, navigate to the directory where you would like to store you notebook and launch the Jupyter Notebooks. Then type the below command and the program will instantiate a local server at http://localhost:8888/tree.

CODE: https://gist.github.com/ORION6194/b90f8daffc9c06100b3a3425efb4dd90.js

A new window with the Jupyter Notebook interface will open in your internet browser. As you might have already noticed Jupyter’s Notebooks and dashboard are web apps, and Jupyter starts up a local Python server to serve these apps to your web browser. It makes Jupyter Notebooks platform independent and thus making it easier to share with others.

Launching Jupyter Notebook

The list of all files is displayed under the Files tab whereas all the running processes can be viewed by clicking on the Running tab and the third tab, Clusters is extended from IPython parallel, IPython’s parallel computing framework. It helps you to control multiple engines, extended from the IPython kernel.

Let's start by making a new notebook. We can easily do this by clicking on the New drop-down list in the top- right corner of the dashboard. You see that you have the option to make a Python 3 notebook as well as regular text file, a folder, and a terminal. Please select the Python 3 notebook option.

Python 3 Notebook

Your Jupyter Notebook will open in a new tab as shown in below image.

Jupyter Notebook Launch

Now each notebook uses its own tab so that you can open multiple notebooks simultaneously. If you switch back to the dashboard, you will see the new file Untitled.ipynb and you should see some green text that tells you your notebook is running.

Jupyter notebook file

Why a .ipynb file?

.ipynb is the standard file format for storing Jupyter Notebooks, hence the file name Untitled.ipynb. Let's begin by first understanding what an .ipynb file is and what it might contain. Each .ipynb file is a text file that describes the contents of your notebook in a JSON format. Each cell and its contents, whether it be text, code or image attachments that have been converted into strings of text, is listed therein along with some additional metadata. You can edit the metadata yourself by selecting “Edit > Edit Notebook Metadata” from the menu bar in the notebook.

You can also view the contents of your notebook files by selecting “Edit” from the controls on the dashboard, there’s no reason to do so unless you really want to edit the file manually.

Understanding the Notebook Interface

Now that you have an open notebook in front of you take a look around. Check out the menus to see what the different options and functions are readily available, especially take some time out to scroll through the list of commands in the command palette, the small button with the keyboard icon (or just press Ctrl + Shift + P )

There are two prominent terminologies that you should care to learn about: cells and kernels are key both to understanding Jupyter and to what makes it more than just a content writing tool. Fortunately, these concepts are not difficult to understand.

  • A kernel is a program that interprets and executes the user’s code. The Jupyter Notebook App has an inbuilt kernel for Python code, but there are also kernels available for other programming languages.
  • A cell is a container which holds the executable code or normal text 

Cells

Cells form the body of a notebook. In the screenshot for a new notebook(Untitled.ipynb) in the section above, the box with the green outline is an empty cell. There are 4 types of cells:

  • Code – This is where you type your code and when executed the kernel will display its output below the cell.
  • Markdown – This is where you type your text formatted using Markdown and the output is displayed in place when it is run.
  • Raw NBConvert – It’s a command line tool to convert your notebook into another format (like HTML, PDF etc.)
  • Heading – This is where you add Headings to separate sections and make your notebook look tidy and neat. This has now been merged into the Markdown option itself. Adding a ‘#’ at the beginning ensures that whatever you type after that will be taken as a heading.

Let’s test out how the cells work with a classic hello world example. Type print('Hello World!') in the cell and press Ctrl + Enter or click on the Run button in the toolbar above.

CODE: https://gist.github.com/ORION6194/54031c84058c3ee3a4a242e0ac44ed62.js

Hello World!

When you run the cell, its output will is also displayed below and the label to its left changes from In[ ]

to In[1] . Moreover, to signify that the cell is still running, Jupyter changes the label to In[*]

Additionally it is important to note that the output of a code cell comes from any text data specifically printed during the cells execution, as well as the value of the last line in the cell, irrespective of it being a function call, an individual variable or something else

Markdown

Markdown is a lightweight, markup language for formatting plain text. Its syntax has a one-to-one correspondence with HTML tags. As this article has been written in a Jupyter notebook, all of the narrative text and images you can see are achieved in Markdown. Let’s cover the basics with a quick example.

CODE: https://gist.github.com/ORION6194/5192147abfbe0d8adb39c767867bdbd7.js

When attaching images, you have three options:

  • Use a URL to an image on the web.
  • Use a local URL to an image that you will be keeping alongside your notebook, such as in the same git repo.
  • Add an attachment via “Edit > Insert Image” ; this will convert the image into a string and store it inside your notebook .ipynb file.

Note that adding an image as an attachment will make the .ipynb file much larger because it is stored inside the notebook in a string format.

There are a lot more features available in Markdown. Once you have familiarized yourself with the basics above, you can refer to the official guide from the creator, John Gruber, on his website.

Kernels

Behind every notebook runs a kernel. Whenever you execute a code cell, the content of the cell is executed within the kernel and any output is returned back to the cell for display. The kernel’s state pertains to the document as a whole and not individual cells and is persisted over time.

For example, if you declare a variable or import some libraries in a cell, they will be available in other cells. Now let’s understand this with the help of an example. First we’ll import a Python package and then define a function.

CODE: https://gist.github.com/ORION6194/caf103bcd4101468a8224da1fdfd6aa0.js

Once we’ve executed the cell above, we can reference os, binascii and sum in any other cell.

CODE: https://gist.github.com/ORION6194/ece0d58532919670029e6b7ed5150186.js

The output should look something like this:

c84766ca4a3ce52c3602bbf02a
d1f7 Sum of 1 and 2 is 3

Majority of times, the flow in your notebook will be top-to-bottom, but it’s common to go back to make changes. In this case, the order of execution is stated to the left of each cell, such as In [2] , will let you know whether any of your cells have stale output. And if you ever wish to reset, there are several options in the Kernel menu which prove to be incredibly useful.

  • Restart: restarts the kernel, thus clearing all the variables etc that were defined.
  • Restart & Clear Output: same as above but will also wipe the output displayed below your code cells.
  • Restart & Run All: same as above but will also run all your cells in order from first to last.
  • Interrupt: If your kernel is ever stuck on a computation and you wish to stop it, you can choose the Interrupt option.

Naming Your Notebooks

It is always a best practice to give a meaningful name to your notebooks. It may appear confusing, but you cannot name or rename your notebooks from the notebook app itself. You must use either the dashboard or your file browser to rename the .ipynb file. We’ll head back to the dashboard to rename the file we created earlier, which will have the default notebook file name Untitled.ipynb.

We cannot rename a notebook while it is running, so let's first shut it down. We can shutdown a running notebook by selecting “File > Close and Halt” from the notebook menu. However, we can also shutdown the kernel either by selecting the notebook in the dashboard and clicking “Shutdown” or by going to “Kernel > Shutdown” from within the notebook app (see images below).

Shutdown the kernel from Notebook App:

Jupyter notebook naming

Shutdown the kernel from Dashboard:

Jupyter Notebook - Shutdown the Kernel

Once the kernel has been shutdown, you can then select your notebook and and click “Rename” in the dashboard controls.

Jupyter notebook - Rename

Sharing Your Notebooks

When we talk about sharing a notebook, there are two things that might come to our mind. In most cases, we would want to share the end-result of the work, i.e. sharing non-interactive, pre-rendered version of the notebook, very much similar to this article; however, in some cases we might want to share the code and collaborate with others on notebooks with the aid of version control systems such as Git which is also possible.

Before You Start Sharing

The state of the shared notebook including the output of any code cells is maintained when exported to a file. Hence, to ensure that the notebook is share-ready, we should follow below steps before sharing.


  1. Click “Cell > All Output > Clear”
  2. Click “Kernel > Restart & Run All”
  3. After the code cells have finished executing, validate the output. 


This ensures that your notebooks don’t have a stale state or contain intermediary output.

Exporting Your Notebooks

Jupyter has built-in support for exporting to HTML, Markdown and PDF as well as several other formats, which you can find from the menu under “File > Download as” . It is a very convenient way to share the results with others. But if sharing exported files isn’t suitable for you, there are some other popular methods of sharing the notebooks directly on the web.

  • GitHub
  • With home to over 2 million notebooks, GitHub is the most popular place for sharing Jupyter projects with the world. GitHub has integrated support for rendering .ipynb files directly both in repositories and gists on its website.
  • You can just follow the GitHub guides for you to get started on your own.
  • Nbviewer
  • NBViewer is one of the most prominent notebook renderers on the web.
  • It also renders your notebook from GitHub and other such code storage platforms and provide a shareable URL along with it. nbviewer.jupyter.org provides a free rendering service as part of Project Jupyter.

Data Analysis in a Jupyter Notebook

Now that we’ve looked at what a Jupyter Notebook is, it’s time to look at how they’re used in practice, which should give you a clearer understanding of why they are so popular. As we walk through the sample analysis, you will be able to see how the flow of a notebook makes the task intuitive to work through ourselves, as well as for others to understand when we share it with them. We also hope to learn some of the more advanced features of Jupyter notebooks along the way. So let's get started, shall we?

Analyzing the Revenue and Profit Trends of Fortune 500 US companies from 1955-2013

So, let’s say you’ve been tasked with finding out how the revenues and profits of the largest companies in the US changed historically over the past 60 years. We shall begin by gathering the data to analyze.

Gathering the DataSet

The data set that we will be using to analyze the revenue and profit trends of fortune 500 companies has been sourced from Fortune 500 Archives and Top Foreign Stocks. For your ease we have compiled the data from both the sources and created a CSV for you.

Importing the Required Dependencies

Let's start off with a code cell specifically for imports and initial setup, so that if we need to add or change anything at a later point in time, we can simply edit and re-run the cell without having to change the other cells. We can start by importing Pandas to work with our data, Matplotlib to plot the charts and Seaborn to make our charts prettier.

CODE: https://gist.github.com/ORION6194/53dd2f1fa4b3bdedcaeebb792d943afa.js

Set the design styles for the charts

CODE: https://gist.github.com/ORION6194/0b0488b8bf61728208591b0f2e127c9e.js

Load the Input Data to be Analyzed

As we plan on using pandas to aid in our analysis, let’s begin by importing our input data set into the most widely used pandas data-structure, DataFrame.

CODE: https://gist.github.com/ORION6194/68fad11133136ead8481337dacea7ddb.js

Now that we are done loading our input dataset, let us see how it looks like!

CODE: https://gist.github.com/ORION6194/aaa2b846fa4ce05b132ccc3074780102.js

Jupyter notebook output data

Looking good. Each row corresponds to a single company per year and all the columns we need are present.

Exploring the Dataset

Next, let's begin by exploring our data set. We will primarily look into the number of records imported and the data types for each of the different columns that were imported.

As we have 500 data points per year and since the data set has records from 1955 to 2012, total number of records in the dataset looks good!

Now let's move on to the individual data types for each of the column.

CODE: https://gist.github.com/ORION6194/16f6156503c023d483f5a60711c7de6d.js

Jupyter notebook datasets

CODE: https://gist.github.com/ORION6194/cb2b7820c2793775d1ad9249bffd30f3.js

Jupyter notebook exploring datasets

As we can see from the output of the above command the data types for the columns revenue and profit are being shown as object whereas the expected data type should be float. This indicates that there may be some non-numeric values in the revenue and profit columns.

So let's first look at the details of imported values for revenue.

CODE: https://gist.github.com/ORION6194/4531469933d1cef9234f1da1aed184a3.js

Jupyter notebook output data

CODE: https://gist.github.com/ORION6194/02d2ed7afbe2fd46eafb3319dd3c3533.js

Number of Non-numeric revenue values: 1

CODE: https://gist.github.com/ORION6194/741b8d283459186d64be450e69d903c4.js

List of distinct Non-numeric revenue values: {'N.A.'}

As the number of non-numeric revenue values is considerably less compared to the total size of our data set. Hence, it would be easier to just remove those rows.

CODE: https://gist.github.com/ORION6194/860bae82df2e07aec5e28508a7801cec.js

Jupyter notebook output

Now that the data type issue for column revenue is resolved, let's move on to values in column profit.

CODE: https://gist.github.com/ORION6194/5f96f29ca4f2c99ad48d9a35e04d6b1c.js

Jupyter notebook output

CODE: https://gist.github.com/ORION6194/2cbef32085daad5e93898635bb404feb.js

Number of Non-numeric profit values: 374

CODE: https://gist.github.com/ORION6194/894cd930db06ad11bd6da0fcad40ae0a.js

List of distinct Non-numeric profit values: {'N.A.'}

Although the number of non-numeric profit values is a small fraction of our data set, it is not completely inconsequential as it is still around 1.5%. If rows containing N.A. are, uniformly distributed over the years, the simplest solution would be to remove those values. So let’s take a quick look at the distribution of values.

CODE: https://gist.github.com/ORION6194/35f6f7d28bc2066df177a1ac66a53256.js

Jupyter notebook Data Analysis

At a glance, we can see that the majority of invalid values in single year is fewer than 25, removing these values would account for less than 4% of the data as there are 500 data points per year. Also, other than a surge during the 90s, most years have fewer than half the missing values of the peak. Let’s assume that this is acceptable for us and move ahead with removing these rows.

CODE: https://gist.github.com/ORION6194/820530f478e28da785139ba9af062cde.js

We should validate if that worked!

CODE: https://gist.github.com/ORION6194/f5a79add9688e83f5ad11f98bc7736dd.js

Jupyter notebook output

Hurray! Our dataset has been cleaned up.

Time to Plot the graphs

Let's begin with defining a function to plot the graph, set the title and add lables for the x-axis and y-axis.

CODE: https://gist.github.com/ORION6194/aa899356be152eb068561c3bff011eb3.js

Let's get on to plotting the average profit by year and average revenue by year using Matplotlib.

CODE: https://gist.github.com/ORION6194/9b3a6b7d0003087f81243a6325c3f447.js

Jupyter notebook data analysis

CODE: https://gist.github.com/ORION6194/a42cf7e8cd4f462f349d5bb0ede45688.js

Jupyter notebook data analysis

Woah! The charts for profits has got some huge ups and downs. It seems like they correspond to the early 1990s recession, the dot-com bubble in the early 2000s and the Great Recession in 2008.

On the other hand, the Revenues are constantly growing and are comparatively stable. Also it does help to understand how the average profits recovered so quickly after the staggering drops because of the recession.

Let's also take a look at how the average profits and revenues compare to their standard deviations.

CODE: https://gist.github.com/ORION6194/12b5f03482ac1c94be00dd74d497c6b1.js


That's astonishing, the standard deviations are huge. Some companies are making billions while some others are losing as much, and the risk certainly has increased along with rising profits and revenues over the years. Although we could keep on playing around with our data set and plot plenty more charts to analyze, it is time to draw this article to a close.

Jupyter notebook Data Analysis

Conclusion

As part of this article we have seen various features of the Jupyter notebooks, from basics like installation, creating, and running code cells to more advanced features like plotting graphs. The power of Jupyter Notebooks to promote a productive working experience and provide an ease of use is evident from the above example, and I do hope that you feel confident to begin using Jupyter Notebooks in your own work and start exploring more advanced features. You can read more about data analytics using Pandas here.

If you’d like to further explore and want to look at more examples, Jupyter has put together A Gallery of Interesting Jupyter Notebooks that you may find helpful and the Nbviewer homepage provides a lot of examples for further references. Find the entire code here on Github.