Thanks! We'll be in touch in the next 12 hours
Oops! Something went wrong while submitting the form.

Implementing Async Features in Python - A Step-by-step Guide

Asynchronous programming is a characteristic of modern programming languages that allows an application to perform various operations without waiting for any of them. Asynchronicity is one of the big reasons for the popularity of Node.js.

We have discussed Python’s asynchronous features as part of our previous post: an introduction to asynchronous programming in Python. This blog is a natural progression on the same topic. We are going to discuss async features in Python in detail and look at some hands-on examples.

Consider a traditional web scraping application that needs to open thousands of network connections. We could open one network connection, fetch the result, and then move to the next ones iteratively. This approach increases the latency of the program. It spends a lot of time opening a connection and waiting for others to finish their bit of work.

On the other hand, async provides you a method of opening thousands of connections at once and swapping among each connection as they finish and return their results. Basically, it sends the request to a connection and moves to the next one instead of waiting for the previous one’s response. It continues like this until all the connections have returned the outputs.  

Image for post

Source: phpmind

From the above chart, we can see that using synchronous programming on four tasks took 45 seconds to complete, while in asynchronous programming, those four tasks took only 20 seconds.

Where Does Asynchronous Programming Fit in the Real-world?

Asynchronous programming is best suited for popular scenarios such as:

1. The program takes too much time to execute.

2. The reason for the delay is waiting for input or output operations, not computation.

3. For the tasks that have multiple input or output operations to be executed at once.

And application-wise, these are the example use cases:

  • Web Scraping
  • Network Services

Difference Between Parallelism, Concurrency, Threading, and Async IO

Because we discussed this comparison in detail in our previous post, we will just quickly go through the concept as it will help us with our hands-on example later.

Parallelism involves performing multiple operations at a time. Multiprocessing is an example of it. It is well suited for CPU bound tasks.

Concurrency is slightly broader than Parallelism. It involves multiple tasks running in an overlapping manner.

Threading – a thread is a separate flow of execution. One process can contain multiple threads and each thread runs independently. It is ideal for IO bound tasks.

Async IO is a single-threaded, single-process design that uses cooperative multitasking. In simple words, async IO gives a feeling of concurrency despite using a single thread in a single process.


A comparison in concurrency and parallelism
Fig:- A comparison in concurrency and parallelism


Components of Async IO Programming

Let’s explore the various components of Async IO in depth. We will also look at an example code to help us understand the implementation.

1. Coroutines

Coroutines are mainly generalization forms of subroutines. They are generally used for cooperative tasks and behave like Python generators.

An async function uses the await keyword to denote a coroutine. When using the await keyword, coroutines release the flow of control back to the event loop.

To run a coroutine, we need to schedule it on the event loop. After scheduling, coroutines are wrapped in Tasks as a Future object.

Example:

In the below snippet, we called async_func from the main function. We have to add the await keyword while calling the sync function. As you can see, async_func will do nothing unless the await keyword implementation accompanies it.

CODE: https://gist.github.com/velotiotech/62621dc28aa525bc1217e233fa5a7b40.js

Output

CODE: https://gist.github.com/velotiotech/d868f2238e0d0cfb0d3b8bb7495c647d.js

2. Tasks

Tasks are used to schedule coroutines concurrently.

When submitting a coroutine to an event loop for processing, you can get a Task object, which provides a way to control the coroutine’s behavior from outside the event loop.

Example:

In the snippet below, we are creating a task using create_task (an inbuilt function of asyncio library), and then we are running it.

CODE: https://gist.github.com/velotiotech/2883ae8fc08b4a16ad4b2b991b92642f.js

Output

CODE: https://gist.github.com/velotiotech/16e4aafcc5957b6813ae02611fd52807.js

4.3 Event Loops

This mechanism runs coroutines until they complete. You can imagine it as while(True) loop that monitors coroutine, taking feedback on what’s idle, and looking around for things that can be executed in the meantime.

It can wake up an idle coroutine when whatever that coroutine is waiting on becomes available.

Only one event loop can run at a time in Python.

Example:

In the snippet below, we are creating three tasks and then appending them in a list and executing all tasks asynchronously using get_event_loop, create_task and the await function of the asyncio library.

CODE: https://gist.github.com/velotiotech/cfa3219da51ef24218c33706971516de.js

Output

CODE: https://gist.github.com/velotiotech/ee345dfd8293b70c56dbdc98fedd90f0.js

Future

A future is a special, low-level available object that represents an eventual result of an asynchronous operation.

When a Future object is awaited, the co-routine will wait until the Future is resolved in some other place.

We will look into the sample code for Future objects in the next section.

A Comparison Between Multithreading and Async IO

Before we get to Async IO, let’s use multithreading as a benchmark and then compare them to see which is more efficient.

For this benchmark, we will be fetching data from a sample URL (the Velotio Career webpage) with different frequencies, like once, ten times, 50 times, 100 times, 500 times, respectively.

We will then compare the time taken by both of these approaches to fetch the required data.

Implementation

Code of Multithreading:

CODE: https://gist.github.com/velotiotech/d33aad834064c12a1a28af5d8d3a2c18.js 

Output

CODE: https://gist.github.com/velotiotech/c087f7a863602bf122097f56e209cf28.js

ProcessPoolExecutor is a Python package that implements the Executor interface. The fetch_url_data is a function to fetch the data from the given URL using the requests python package, and the get_all_url_data function is used to map the fetch_url_data function to the lists of URLs.

Async IO Programming Example:

CODE: https://gist.github.com/velotiotech/3bd9314613b1079d1dd2a69048676457.js

Output

CODE: https://gist.github.com/velotiotech/7480b34b26525e2d3bba07dc61665397.js

We need to use the get_event_loop function to create and add the tasks. For running more than one URL, we have to use ensure_future and gather function.

The fetch_async function is used to add the task in the event_loop object and the fetch_url_data function is used to read the data from the URL using the session package. The future_result method returns the response of all the tasks.

Results:

As you can see from the plot, async programming is much more efficient than multi-threading for the program above. 

The graph of the multithreading program looks linear, while the asyncio program graph is similar to logarithmic.


AsyncIO vs Multithreading program

Conclusion

As we saw in our experiment above, Async IO showed better performance with the efficient use of concurrency than multi-threading.

Async IO can be beneficial in applications that can exploit concurrency. Though, based on what kind of applications we are dealing with, it is very pragmatic to choose Async IO over other implementations.

We hope this article helped further your understanding of the async feature in Python and gave you some quick hands-on experience using the code examples shared above.