Thanks! We'll be in touch in the next 12 hours
Oops! Something went wrong while submitting the form.

The Ultimate Beginner’s Guide to Jupyter Notebooks

Pratik Ambhore

Data Engineering

Jupyter Notebooks offer a great way to write and iterate on your Python code. It is a powerful tool for developing data science projects in an interactive way. Jupyter Notebook allows to showcase the source code and its corresponding output at a single place helping combine narrative text, visualizations and other rich media.The intuitive workflow promotes iterative and rapid development, making notebooks the first choice for data scientists. Creating Jupyter Notebooks is completely free as it falls under Project Jupyter which is completely open source.

Project Jupyter is the successor to an earlier project IPython Notebook, which was first published as a prototype in 2010. Jupyter Notebook is built on top of iPython, an interactive tool for executing Python code in the terminal using REPL model(Read-Eval-Print-Loop). The iPython kernel executes the python code and communicates with the Jupyter Notebook front-end interface. Jupyter Notebooks also provide additional features like storing your code and output and keep the markdown by extending iPython.

Although Jupyter Notebooks support using various programming languages, we will focus on Python and its application in this article.

Getting Started with Jupyter Notebooks!

Installation

Prerequisites

As you would have surmised from the above abstract we need to have Python installed on your machine. Either Python 2.7 or Python 3.+ will do.

Install Using Anaconda

The simplest way to get started with Jupyter Notebooks is by installing it using Anaconda. Anaconda installs both Python3 and Jupyter and also includes quite a lot of packages commonly used in the data science and machine learning community. You can follow the latest guidelines from here.

Install Using Pip

If, for some reason, you decide not to use Anaconda, then you can install Jupyter manually using Python pip package, just follow the below code:

CODE: https://gist.github.com/velotiotech/507835515055617d00eee83706e33bc2.js

Launching First Notebook

Open your terminal, navigate to the directory where you would like to store you notebook and launch the Jupyter Notebooks. Then type the below command and the program will instantiate a local server at http://localhost:8888/tree.

CODE: https://gist.github.com/velotiotech/3813d5efffeaa5a3e52f372ecc9aca56.js

A new window with the Jupyter Notebook interface will open in your internet browser. As you might have already noticed Jupyter starts up a local Python server to serve web apps in your browser, where you can access the Dashboard and work with the Jupyter Notebooks. The Jupyter Notebooks are platform independent which makes it easier to collaborate and share with others.

Launching Jupyter Notebook

The list of all files is displayed under the Files tab whereas all the running processes can be viewed by clicking on the Running tab and the third tab, Clusters is extended from IPython parallel, IPython’s parallel computing framework. It helps you to control multiple engines, extended from the IPython kernel.

Let's start by making a new notebook. We can easily do this by clicking on the New drop-down list in the top- right corner of the dashboard. You see that you have the option to make a Python 3 notebook as well as regular text file, a folder, and a terminal. Please select the Python 3 notebook option.

Python 3 Notebook

Your Jupyter Notebook will open in a new tab as shown in below image.

Jupyter Notebook Launch

Now each notebook is opened in a new tab so that you can simultaneously work with multiple notebooks. If you go back to the dashboard tab, you will see the new file Untitled.ipynb and you should see some green icon to it’s left which indicates your new notebook is running.


Jupyter notebook file

Why a .ipynb file?

.ipynb is the standard file format for storing Jupyter Notebooks, hence the file name Untitled.ipynb. Let's begin by first understanding what an .ipynb file is and what it might contain. Each .ipynb file is a text file that describes the content of your notebook in a JSON format. The content of each cell, whether it is text, code or image attachments that have been converted into strings, along with some additional metadata is stored in the .ipynb file. You can also edit the metadata by selecting “Edit > Edit Notebook Metadata” from the menu options in the notebook.

You can also view the content of your notebook files by selecting “Edit” from the controls on the dashboard, there’s no reason to do so unless you really want to edit the file manually.

Understanding the Notebook Interface

Now that you have created a notebook, let's have a look at the various menu options and functions, which are readily available. Take some time out to scroll through the the list of commands that opens up when you click on the keyboard icon (or press Ctrl + Shift + P).

There are two prominent terminologies that you should care to learn about: cells and kernels are key both to understanding Jupyter and to what makes it more than just a content writing tool. Fortunately, these concepts are not difficult to understand.

  • A kernel is a program that interprets and executes the user’s code. The Jupyter Notebook App has an inbuilt kernel for Python code, but there are also kernels available for other programming languages.
  • A cell is a container which holds the executable code or normal text 

Cells

Cells form the body of a notebook. If you look at the screenshot above for a new notebook (Untitled.ipynb), the text box with the green border is an empty cell. There are 4 types of cells:

  • Code – This is where you type your code and when executed the kernel will display its output below the cell.
  • Markdown – This is where you type your text formatted using Markdown and the output is displayed in place when it is run.
  • Raw NBConvert – It’s a command line tool to convert your notebook into another format (like HTML, PDF etc.)
  • Heading – This is where you add Headings to separate sections and make your notebook look tidy and neat. This has now been merged into the Markdown option itself. Adding a ‘#’ at the beginning ensures that whatever you type after that will be taken as a heading.

Let’s test out how the cells work with a basic "hello world" example. Type print('Hello World!') in the cell and press Ctrl + Enter or click on the Run button in the toolbar at the top.

CODE: https://gist.github.com/velotiotech/f6391159e36d2929e27238450125c569.js

Hello World!

When you run the cell, the output will be displayed below, and the label to its left changes from In[ ] to In[1] . Moreover, to signify that the cell is still running, Jupyter changes the label to In[*]

Additionally, it is important to note that the output of a code cell comes from any of the print statements in the code cell, as well as the value of the last line in the cell, irrespective of it being a variable, function call or some other code snippet.

Markdown

Markdown is a lightweight, markup language for formatting plain text. Its syntax has a one-to-one correspondence with HTML tags. As this article has been written in a Jupyter notebook, all of the narrative text and images you can see, are written in Markdown. Let’s go through the basics with the following example.

CODE: https://gist.github.com/velotiotech/7bcc7d871b9e2670d5c817c534af81ea.js

We have 3 different ways to attach images

  • Link the URL of an image from the web.
  • Use relative path of an image present locally
  • Add an attachment to the notebook by using “Edit>Insert Image” option; This method converts the image into a string and store it inside your notebook

Note that adding an image as an attachment will make the .ipynb file much larger because it is stored inside the notebook in a string format.

There are a lot more features available in Markdown. To learn more about markdown, you can refer to the official guide from the creator, John Gruber, on his website.

Kernels

Every notebook runs on top of a kernel. Whenever you execute a code cell, the content of the cell is executed within the kernel and any output is returned back to the cell for display. The kernel’s state applies to the document as a whole and not individual cells and is persisted over time.

For example, if you declare a variable or import some libraries in a cell, they will be accessible in other cells. Now let’s understand this with the help of an example. First we’ll import a Python package and then define a function.

CODE: https://gist.github.com/velotiotech/b4a2bd2ae52a9c740ed45cc058311477.js

Once the cell above  is executed, we can reference os, binascii and sum in any other cell.

CODE: https://gist.github.com/velotiotech/ceb2a237ca865b3d33ff1423d40ab5e8.js

The output should look something like this:

CODE: https://gist.github.com/velotiotech/21a1fe6524cbcff05b22dd94de57b78d.js

The execution flow of a notebook is generally from top-to-bottom, but it’s common to go back to make changes. The order of execution is shown to the left of each cell, such as In [2] , will let you know whether any of your cells have stale output. Additionally, there are multiple options in the Kernel menu which often come very handy.

  • Restart: restarts the kernel, thus clearing all the variables etc that were defined.
  • Restart & Clear Output: same as above but will also wipe the output displayed below your code cells.
  • Restart & Run All: same as above but will also run all your cells in order from top-to-bottom.
  • Interrupt: If your kernel is ever stuck on a computation and you wish to stop it, you can choose the Interrupt option.

Naming Your Notebooks

It is always a best practice to give a meaningful name to your notebooks. You can rename your notebooks from the notebook app itself by double-clicking on the existing name at the top left corner. You can also use the dashboard or the file browser to rename the notebook file. We’ll head back to the dashboard to rename the file we created earlier, which will have the default notebook file name Untitled.ipynb.

Now that you are back on the dashboard, you can simply select your notebook and click “Rename” in the dashboard controls

Jupyter notebook - Rename

Shutting Down your Notebooks

We can shutdown a running notebook by selecting “File > Close and Halt” from the notebook menu. However, we can also shutdown the kernel either by selecting the notebook in the dashboard and clicking “Shutdown” or by going to “Kernel > Shutdown” from within the notebook app (see images below).

Shutdown the kernel from Notebook App:

Jupyter notebook 5.png

 

Shutdown the kernel from Dashboard:

Jupyter notebook 6.png

 


Sharing Your Notebooks

When we talk about sharing a notebook, there are two things that might come to our mind. In most cases, we would want to share the end-result of the work, i.e. sharing non-interactive, pre-rendered version of the notebook, very much similar to this article; however, in some cases we might want to share the code and collaborate with others on notebooks with the aid of version control systems such as Git which is also possible.

Before You Start Sharing

The state of the shared notebook including the output of any code cells is maintained when exported to a file. Hence, to ensure that the notebook is share-ready, we should follow below steps before sharing.

  1. Click “Cell > All Output > Clear”
  2. Click “Kernel > Restart & Run All”
  3. After the code cells have finished executing, validate the output. 

This ensures that your notebooks don’t have a stale state or contain intermediary output.

Exporting Your Notebooks

Jupyter has built-in support for exporting to HTML, Markdown and PDF as well as several other formats, which you can find from the menu under “File > Download as” . It is a very convenient way to share the results with others. But if sharing exported files isn’t suitable for you, there are some other popular methods of sharing the notebooks directly on the web.

  • GitHub
  • With home to over 2 million notebooks, GitHub is the most popular place for sharing Jupyter projects with the world. GitHub has integrated support for rendering .ipynb files directly both in repositories and gists on its website.
  • You can just follow the GitHub guides for you to get started on your own.
  • Nbviewer
  • NBViewer is one of the most prominent notebook renderers on the web.
  • It also renders your notebook from GitHub and other such code storage platforms and provide a shareable URL along with it. nbviewer.jupyter.org provides a free rendering service as part of Project Jupyter.

Data Analysis in a Jupyter Notebook

Now that we’ve looked at what a Jupyter Notebook is, it’s time to look at how they’re used in practice, which should give you a clearer understanding of why they are so popular. As we walk through the sample analysis, you will be able to see how the flow of a notebook makes the task intuitive to work through ourselves, as well as for others to understand when we share it with them. We also hope to learn some of the more advanced features of Jupyter notebooks along the way. So let's get started, shall we?

Analyzing the Revenue and Profit Trends of Fortune 500 US companies from 1955-2013

So, let’s say you’ve been tasked with finding out how the revenues and profits of the largest companies in the US changed historically over the past 60 years. We shall begin by gathering the data to analyze.

Gathering the DataSet

The data set that we will be using to analyze the revenue and profit trends of fortune 500 companies has been sourced from Fortune 500 Archives and Top Foreign Stocks. For your ease we have compiled the data from both the sources and created a CSV for you.

Importing the Required Dependencies

Let's start off with a code cell specifically for imports and initial setup, so that if we need to add or change anything at a later point in time, we can simply edit and re-run the cell without having to change the other cells. We can start by importing Pandas to work with our data, Matplotlib to plot the charts and Seaborn to make our charts prettier.

CODE: https://gist.github.com/velotiotech/93ad58aed1eb0d72b593764eb2db6b80.js

Set the design styles for the charts

CODE: https://gist.github.com/velotiotech/56b33c31c348cca9f517224a9f69d79d.js

Load the Input Data to be Analyzed

As we plan on using pandas to aid in our analysis, let’s begin by importing our input data set into the most widely used pandas data-structure, DataFrame.

CODE: https://gist.github.com/velotiotech/640d2abf8c947ce992f5dbce24d59f59.js

Now that we are done loading our input dataset, let us see how it looks like!

CODE: https://gist.github.com/velotiotech/3fe96f00601f12fb68ec8f91035e2194.js

Jupyter notebook output data

Looking good. Each row corresponds to a single company per year and all the columns we need are present.

Exploring the Dataset

Next, let's begin by exploring our data set. We will primarily look into the number of records imported and the data types for each of the different columns that were imported.

As we have 500 data points per year and since the data set has records between 1955 and 2012, the total number of records in the dataset looks good!

Now, let's move on to the individual data types for each of the column.

CODE: https://gist.github.com/velotiotech/3716915eea35b690e7d94670e5145506.js

Jupyter notebook datasets

CODE: https://gist.github.com/velotiotech/7b30f53168a142ef06b55d81499efe05.js

Jupyter notebook exploring datasets

As we can see from the output of the above command the data types for the columns revenue and profit are being shown as object whereas the expected data type should be float. It indicates that there may be some non-numeric values in the revenue and profit columns.

So let's first look at the details of imported values for revenue.

CODE: https://gist.github.com/velotiotech/6d748a0f699939637ff849c2bc600347.js

Jupyter notebook output data

CODE: https://gist.github.com/velotiotech/ff2d22dc4142fb0e2abc7ed007110241.js

CODE: https://gist.github.com/velotiotech/6553edce479138e1bc8f1912790bd4c0.js

CODE: https://gist.github.com/velotiotech/e43d5b2b3e47f407747640170b954867.js

CODE: https://gist.github.com/velotiotech/198d7a23045bff95afdced5dcaa62d3d.js

As the number of non-numeric revenue values is considerably less compared to the total size of our data set. Hence, it would be easier to just remove those rows.

CODE: https://gist.github.com/velotiotech/8ccd7a44ff743c086e02e4a4f2ea2e69.js

Jupyter notebook output

Now that the data type issue for column revenue is resolved, let's move on to values in column profit.

CODE: https://gist.github.com/velotiotech/3eb4fe0bbaf1c8fc5a859b8ac022e25c.js

Jupyter notebook output

CODE: https://gist.github.com/velotiotech/d08eae2a27f62616bcc691628e1062f8.js

CODE: https://gist.github.com/velotiotech/191b64c95a1ddd17eb9071ffc5d43a4e.js

CODE: https://gist.github.com/velotiotech/9bb942c2401cb9ba6dcf3d07f1de7b67.js

CODE: https://gist.github.com/velotiotech/91c83b800d81ddbc614bf4202c563176.js

As the number of non-numeric profit values is around 1.5% which is a small percentage of our data set, but not completely inconsequential. Let’s take a quick look at the distribution of values and if the rows having N.A. values are uniformly distributed over the years then it would be wise to just remove the rows with missing values.

CODE: https://gist.github.com/velotiotech/5f57e3d6c96aec29a1a29d90c7fb1b4f.js

Jupyter notebook Data Analysis

As observed from the histogram above, majority of invalid values in single year is fewer than 25, removing these values would account for less than 4% of the data as there are 500 data points per year. Also, other than a surge around 1990, most years have fewer than less than 10 values missing. Let’s assume that this is acceptable for us and move ahead with removing these rows.

CODE: https://gist.github.com/velotiotech/a5375c1cfbb1309e5ac63e970883ff36.js

We should validate if that worked!

CODE: https://gist.github.com/velotiotech/96c9c79e06733e4c10653a97bf2d9363.js

Jupyter notebook output

Hurray! Our dataset has been cleaned up.

Time to Plot the graphs

Let's begin with defining a function to plot the graph, set the title and add lables for the x-axis and y-axis.

CODE: https://gist.github.com/velotiotech/c0adf037bb795daec985270fc3d98a4d.js

Let's plot the average profit by year and average revenue by year using Matplotlib.

CODE: https://gist.github.com/velotiotech/1491ed83894d651e6a70954e95b722e3.js

Jupyter notebook data analysis

CODE: https://gist.github.com/velotiotech/cd92aea3f27c9733aef0d3e5764484ed.js

Jupyter notebook data analysis

Woah! The charts for profits has got some huge ups and downs. It seems like they correspond to the early 1990s recession, the dot-com bubble in the early 2000s and the Great Recession in 2008.

On the other hand, the Revenues are constantly growing and are comparatively stable. Also it does help to understand how the average profits recovered so quickly after the staggering drops because of the recession.

Let's also take a look at how the average profits and revenues compare to their standard deviations.

CODE: https://gist.github.com/velotiotech/3f5709af11d381da6a06178f0aea454e.js


That's astonishing, the standard deviations are huge. Some companies are making billions while some others are losing as much, and the risk certainly has increased along with rising profits and revenues over the years. Although we could keep on playing around with our data set and plot plenty more charts to analyze, it is time to bring this article to a close.

Jupyter notebook Data Analysis

Conclusion

As part of this article we have seen various features of the Jupyter notebooks, from basics like installation, creating, and running code cells to more advanced features like plotting graphs. The power of Jupyter Notebooks to promote a productive working experience and provide an ease of use is evident from the above example, and I do hope that you feel confident to begin using Jupyter Notebooks in your own work and start exploring more advanced features. You can read more about data analytics using Pandas here.

If you’d like to further explore and want to look at more examples, Jupyter has put together A Gallery of Interesting Jupyter Notebooks that you may find helpful and the Nbviewer homepage provides a lot of examples for further references. Find the entire code here on Github.

Get the latest engineering blogs delivered straight to your inbox.
No spam. Only expert insights.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings

The Ultimate Beginner’s Guide to Jupyter Notebooks

Jupyter Notebooks offer a great way to write and iterate on your Python code. It is a powerful tool for developing data science projects in an interactive way. Jupyter Notebook allows to showcase the source code and its corresponding output at a single place helping combine narrative text, visualizations and other rich media.The intuitive workflow promotes iterative and rapid development, making notebooks the first choice for data scientists. Creating Jupyter Notebooks is completely free as it falls under Project Jupyter which is completely open source.

Project Jupyter is the successor to an earlier project IPython Notebook, which was first published as a prototype in 2010. Jupyter Notebook is built on top of iPython, an interactive tool for executing Python code in the terminal using REPL model(Read-Eval-Print-Loop). The iPython kernel executes the python code and communicates with the Jupyter Notebook front-end interface. Jupyter Notebooks also provide additional features like storing your code and output and keep the markdown by extending iPython.

Although Jupyter Notebooks support using various programming languages, we will focus on Python and its application in this article.

Getting Started with Jupyter Notebooks!

Installation

Prerequisites

As you would have surmised from the above abstract we need to have Python installed on your machine. Either Python 2.7 or Python 3.+ will do.

Install Using Anaconda

The simplest way to get started with Jupyter Notebooks is by installing it using Anaconda. Anaconda installs both Python3 and Jupyter and also includes quite a lot of packages commonly used in the data science and machine learning community. You can follow the latest guidelines from here.

Install Using Pip

If, for some reason, you decide not to use Anaconda, then you can install Jupyter manually using Python pip package, just follow the below code:

CODE: https://gist.github.com/velotiotech/507835515055617d00eee83706e33bc2.js

Launching First Notebook

Open your terminal, navigate to the directory where you would like to store you notebook and launch the Jupyter Notebooks. Then type the below command and the program will instantiate a local server at http://localhost:8888/tree.

CODE: https://gist.github.com/velotiotech/3813d5efffeaa5a3e52f372ecc9aca56.js

A new window with the Jupyter Notebook interface will open in your internet browser. As you might have already noticed Jupyter starts up a local Python server to serve web apps in your browser, where you can access the Dashboard and work with the Jupyter Notebooks. The Jupyter Notebooks are platform independent which makes it easier to collaborate and share with others.

Launching Jupyter Notebook

The list of all files is displayed under the Files tab whereas all the running processes can be viewed by clicking on the Running tab and the third tab, Clusters is extended from IPython parallel, IPython’s parallel computing framework. It helps you to control multiple engines, extended from the IPython kernel.

Let's start by making a new notebook. We can easily do this by clicking on the New drop-down list in the top- right corner of the dashboard. You see that you have the option to make a Python 3 notebook as well as regular text file, a folder, and a terminal. Please select the Python 3 notebook option.

Python 3 Notebook

Your Jupyter Notebook will open in a new tab as shown in below image.

Jupyter Notebook Launch

Now each notebook is opened in a new tab so that you can simultaneously work with multiple notebooks. If you go back to the dashboard tab, you will see the new file Untitled.ipynb and you should see some green icon to it’s left which indicates your new notebook is running.


Jupyter notebook file

Why a .ipynb file?

.ipynb is the standard file format for storing Jupyter Notebooks, hence the file name Untitled.ipynb. Let's begin by first understanding what an .ipynb file is and what it might contain. Each .ipynb file is a text file that describes the content of your notebook in a JSON format. The content of each cell, whether it is text, code or image attachments that have been converted into strings, along with some additional metadata is stored in the .ipynb file. You can also edit the metadata by selecting “Edit > Edit Notebook Metadata” from the menu options in the notebook.

You can also view the content of your notebook files by selecting “Edit” from the controls on the dashboard, there’s no reason to do so unless you really want to edit the file manually.

Understanding the Notebook Interface

Now that you have created a notebook, let's have a look at the various menu options and functions, which are readily available. Take some time out to scroll through the the list of commands that opens up when you click on the keyboard icon (or press Ctrl + Shift + P).

There are two prominent terminologies that you should care to learn about: cells and kernels are key both to understanding Jupyter and to what makes it more than just a content writing tool. Fortunately, these concepts are not difficult to understand.

  • A kernel is a program that interprets and executes the user’s code. The Jupyter Notebook App has an inbuilt kernel for Python code, but there are also kernels available for other programming languages.
  • A cell is a container which holds the executable code or normal text 

Cells

Cells form the body of a notebook. If you look at the screenshot above for a new notebook (Untitled.ipynb), the text box with the green border is an empty cell. There are 4 types of cells:

  • Code – This is where you type your code and when executed the kernel will display its output below the cell.
  • Markdown – This is where you type your text formatted using Markdown and the output is displayed in place when it is run.
  • Raw NBConvert – It’s a command line tool to convert your notebook into another format (like HTML, PDF etc.)
  • Heading – This is where you add Headings to separate sections and make your notebook look tidy and neat. This has now been merged into the Markdown option itself. Adding a ‘#’ at the beginning ensures that whatever you type after that will be taken as a heading.

Let’s test out how the cells work with a basic "hello world" example. Type print('Hello World!') in the cell and press Ctrl + Enter or click on the Run button in the toolbar at the top.

CODE: https://gist.github.com/velotiotech/f6391159e36d2929e27238450125c569.js

Hello World!

When you run the cell, the output will be displayed below, and the label to its left changes from In[ ] to In[1] . Moreover, to signify that the cell is still running, Jupyter changes the label to In[*]

Additionally, it is important to note that the output of a code cell comes from any of the print statements in the code cell, as well as the value of the last line in the cell, irrespective of it being a variable, function call or some other code snippet.

Markdown

Markdown is a lightweight, markup language for formatting plain text. Its syntax has a one-to-one correspondence with HTML tags. As this article has been written in a Jupyter notebook, all of the narrative text and images you can see, are written in Markdown. Let’s go through the basics with the following example.

CODE: https://gist.github.com/velotiotech/7bcc7d871b9e2670d5c817c534af81ea.js

We have 3 different ways to attach images

  • Link the URL of an image from the web.
  • Use relative path of an image present locally
  • Add an attachment to the notebook by using “Edit>Insert Image” option; This method converts the image into a string and store it inside your notebook

Note that adding an image as an attachment will make the .ipynb file much larger because it is stored inside the notebook in a string format.

There are a lot more features available in Markdown. To learn more about markdown, you can refer to the official guide from the creator, John Gruber, on his website.

Kernels

Every notebook runs on top of a kernel. Whenever you execute a code cell, the content of the cell is executed within the kernel and any output is returned back to the cell for display. The kernel’s state applies to the document as a whole and not individual cells and is persisted over time.

For example, if you declare a variable or import some libraries in a cell, they will be accessible in other cells. Now let’s understand this with the help of an example. First we’ll import a Python package and then define a function.

CODE: https://gist.github.com/velotiotech/b4a2bd2ae52a9c740ed45cc058311477.js

Once the cell above  is executed, we can reference os, binascii and sum in any other cell.

CODE: https://gist.github.com/velotiotech/ceb2a237ca865b3d33ff1423d40ab5e8.js

The output should look something like this:

CODE: https://gist.github.com/velotiotech/21a1fe6524cbcff05b22dd94de57b78d.js

The execution flow of a notebook is generally from top-to-bottom, but it’s common to go back to make changes. The order of execution is shown to the left of each cell, such as In [2] , will let you know whether any of your cells have stale output. Additionally, there are multiple options in the Kernel menu which often come very handy.

  • Restart: restarts the kernel, thus clearing all the variables etc that were defined.
  • Restart & Clear Output: same as above but will also wipe the output displayed below your code cells.
  • Restart & Run All: same as above but will also run all your cells in order from top-to-bottom.
  • Interrupt: If your kernel is ever stuck on a computation and you wish to stop it, you can choose the Interrupt option.

Naming Your Notebooks

It is always a best practice to give a meaningful name to your notebooks. You can rename your notebooks from the notebook app itself by double-clicking on the existing name at the top left corner. You can also use the dashboard or the file browser to rename the notebook file. We’ll head back to the dashboard to rename the file we created earlier, which will have the default notebook file name Untitled.ipynb.

Now that you are back on the dashboard, you can simply select your notebook and click “Rename” in the dashboard controls

Jupyter notebook - Rename

Shutting Down your Notebooks

We can shutdown a running notebook by selecting “File > Close and Halt” from the notebook menu. However, we can also shutdown the kernel either by selecting the notebook in the dashboard and clicking “Shutdown” or by going to “Kernel > Shutdown” from within the notebook app (see images below).

Shutdown the kernel from Notebook App:

Jupyter notebook 5.png

 

Shutdown the kernel from Dashboard:

Jupyter notebook 6.png

 


Sharing Your Notebooks

When we talk about sharing a notebook, there are two things that might come to our mind. In most cases, we would want to share the end-result of the work, i.e. sharing non-interactive, pre-rendered version of the notebook, very much similar to this article; however, in some cases we might want to share the code and collaborate with others on notebooks with the aid of version control systems such as Git which is also possible.

Before You Start Sharing

The state of the shared notebook including the output of any code cells is maintained when exported to a file. Hence, to ensure that the notebook is share-ready, we should follow below steps before sharing.

  1. Click “Cell > All Output > Clear”
  2. Click “Kernel > Restart & Run All”
  3. After the code cells have finished executing, validate the output. 

This ensures that your notebooks don’t have a stale state or contain intermediary output.

Exporting Your Notebooks

Jupyter has built-in support for exporting to HTML, Markdown and PDF as well as several other formats, which you can find from the menu under “File > Download as” . It is a very convenient way to share the results with others. But if sharing exported files isn’t suitable for you, there are some other popular methods of sharing the notebooks directly on the web.

  • GitHub
  • With home to over 2 million notebooks, GitHub is the most popular place for sharing Jupyter projects with the world. GitHub has integrated support for rendering .ipynb files directly both in repositories and gists on its website.
  • You can just follow the GitHub guides for you to get started on your own.
  • Nbviewer
  • NBViewer is one of the most prominent notebook renderers on the web.
  • It also renders your notebook from GitHub and other such code storage platforms and provide a shareable URL along with it. nbviewer.jupyter.org provides a free rendering service as part of Project Jupyter.

Data Analysis in a Jupyter Notebook

Now that we’ve looked at what a Jupyter Notebook is, it’s time to look at how they’re used in practice, which should give you a clearer understanding of why they are so popular. As we walk through the sample analysis, you will be able to see how the flow of a notebook makes the task intuitive to work through ourselves, as well as for others to understand when we share it with them. We also hope to learn some of the more advanced features of Jupyter notebooks along the way. So let's get started, shall we?

Analyzing the Revenue and Profit Trends of Fortune 500 US companies from 1955-2013

So, let’s say you’ve been tasked with finding out how the revenues and profits of the largest companies in the US changed historically over the past 60 years. We shall begin by gathering the data to analyze.

Gathering the DataSet

The data set that we will be using to analyze the revenue and profit trends of fortune 500 companies has been sourced from Fortune 500 Archives and Top Foreign Stocks. For your ease we have compiled the data from both the sources and created a CSV for you.

Importing the Required Dependencies

Let's start off with a code cell specifically for imports and initial setup, so that if we need to add or change anything at a later point in time, we can simply edit and re-run the cell without having to change the other cells. We can start by importing Pandas to work with our data, Matplotlib to plot the charts and Seaborn to make our charts prettier.

CODE: https://gist.github.com/velotiotech/93ad58aed1eb0d72b593764eb2db6b80.js

Set the design styles for the charts

CODE: https://gist.github.com/velotiotech/56b33c31c348cca9f517224a9f69d79d.js

Load the Input Data to be Analyzed

As we plan on using pandas to aid in our analysis, let’s begin by importing our input data set into the most widely used pandas data-structure, DataFrame.

CODE: https://gist.github.com/velotiotech/640d2abf8c947ce992f5dbce24d59f59.js

Now that we are done loading our input dataset, let us see how it looks like!

CODE: https://gist.github.com/velotiotech/3fe96f00601f12fb68ec8f91035e2194.js

Jupyter notebook output data

Looking good. Each row corresponds to a single company per year and all the columns we need are present.

Exploring the Dataset

Next, let's begin by exploring our data set. We will primarily look into the number of records imported and the data types for each of the different columns that were imported.

As we have 500 data points per year and since the data set has records between 1955 and 2012, the total number of records in the dataset looks good!

Now, let's move on to the individual data types for each of the column.

CODE: https://gist.github.com/velotiotech/3716915eea35b690e7d94670e5145506.js

Jupyter notebook datasets

CODE: https://gist.github.com/velotiotech/7b30f53168a142ef06b55d81499efe05.js

Jupyter notebook exploring datasets

As we can see from the output of the above command the data types for the columns revenue and profit are being shown as object whereas the expected data type should be float. It indicates that there may be some non-numeric values in the revenue and profit columns.

So let's first look at the details of imported values for revenue.

CODE: https://gist.github.com/velotiotech/6d748a0f699939637ff849c2bc600347.js

Jupyter notebook output data

CODE: https://gist.github.com/velotiotech/ff2d22dc4142fb0e2abc7ed007110241.js

CODE: https://gist.github.com/velotiotech/6553edce479138e1bc8f1912790bd4c0.js

CODE: https://gist.github.com/velotiotech/e43d5b2b3e47f407747640170b954867.js

CODE: https://gist.github.com/velotiotech/198d7a23045bff95afdced5dcaa62d3d.js

As the number of non-numeric revenue values is considerably less compared to the total size of our data set. Hence, it would be easier to just remove those rows.

CODE: https://gist.github.com/velotiotech/8ccd7a44ff743c086e02e4a4f2ea2e69.js

Jupyter notebook output

Now that the data type issue for column revenue is resolved, let's move on to values in column profit.

CODE: https://gist.github.com/velotiotech/3eb4fe0bbaf1c8fc5a859b8ac022e25c.js

Jupyter notebook output

CODE: https://gist.github.com/velotiotech/d08eae2a27f62616bcc691628e1062f8.js

CODE: https://gist.github.com/velotiotech/191b64c95a1ddd17eb9071ffc5d43a4e.js

CODE: https://gist.github.com/velotiotech/9bb942c2401cb9ba6dcf3d07f1de7b67.js

CODE: https://gist.github.com/velotiotech/91c83b800d81ddbc614bf4202c563176.js

As the number of non-numeric profit values is around 1.5% which is a small percentage of our data set, but not completely inconsequential. Let’s take a quick look at the distribution of values and if the rows having N.A. values are uniformly distributed over the years then it would be wise to just remove the rows with missing values.

CODE: https://gist.github.com/velotiotech/5f57e3d6c96aec29a1a29d90c7fb1b4f.js

Jupyter notebook Data Analysis

As observed from the histogram above, majority of invalid values in single year is fewer than 25, removing these values would account for less than 4% of the data as there are 500 data points per year. Also, other than a surge around 1990, most years have fewer than less than 10 values missing. Let’s assume that this is acceptable for us and move ahead with removing these rows.

CODE: https://gist.github.com/velotiotech/a5375c1cfbb1309e5ac63e970883ff36.js

We should validate if that worked!

CODE: https://gist.github.com/velotiotech/96c9c79e06733e4c10653a97bf2d9363.js

Jupyter notebook output

Hurray! Our dataset has been cleaned up.

Time to Plot the graphs

Let's begin with defining a function to plot the graph, set the title and add lables for the x-axis and y-axis.

CODE: https://gist.github.com/velotiotech/c0adf037bb795daec985270fc3d98a4d.js

Let's plot the average profit by year and average revenue by year using Matplotlib.

CODE: https://gist.github.com/velotiotech/1491ed83894d651e6a70954e95b722e3.js

Jupyter notebook data analysis

CODE: https://gist.github.com/velotiotech/cd92aea3f27c9733aef0d3e5764484ed.js

Jupyter notebook data analysis

Woah! The charts for profits has got some huge ups and downs. It seems like they correspond to the early 1990s recession, the dot-com bubble in the early 2000s and the Great Recession in 2008.

On the other hand, the Revenues are constantly growing and are comparatively stable. Also it does help to understand how the average profits recovered so quickly after the staggering drops because of the recession.

Let's also take a look at how the average profits and revenues compare to their standard deviations.

CODE: https://gist.github.com/velotiotech/3f5709af11d381da6a06178f0aea454e.js


That's astonishing, the standard deviations are huge. Some companies are making billions while some others are losing as much, and the risk certainly has increased along with rising profits and revenues over the years. Although we could keep on playing around with our data set and plot plenty more charts to analyze, it is time to bring this article to a close.

Jupyter notebook Data Analysis

Conclusion

As part of this article we have seen various features of the Jupyter notebooks, from basics like installation, creating, and running code cells to more advanced features like plotting graphs. The power of Jupyter Notebooks to promote a productive working experience and provide an ease of use is evident from the above example, and I do hope that you feel confident to begin using Jupyter Notebooks in your own work and start exploring more advanced features. You can read more about data analytics using Pandas here.

If you’d like to further explore and want to look at more examples, Jupyter has put together A Gallery of Interesting Jupyter Notebooks that you may find helpful and the Nbviewer homepage provides a lot of examples for further references. Find the entire code here on Github.

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings