We live in a world where speed is important. With cutting-edge technology coming into the telecommunications and software industry, we expect to get things done quickly. We want to develop applications that are fast, can process high volumes of data and requests, and keep the end-user happy.
This is great, but of course, it’s easier said than done. That’s why concurrency and parallelism are important in application development. We must process data as fast as possible. Every programming language has its own way of dealing with this, and we will see how Golang does it.
Now, many of us choose Golang because of its concurrency, and the inclusion of goroutines and channels has massively impacted the concurrency.
This blog will cover channels and how they work internally, as well as their key components. To benefit the most from this content, it will help to know a little about goroutines and channels as this blog gets into the internals of channels. If you don’t know anything, then don’t worry, we’ll be starting off with an introduction to channels, and then we’ll see how they operate.
What are channels?
Normally, when we talk about channels, we think of the ones in applications like RabbitMQ, Redis, AWS SQS, and so on. Anyone with no or only a small amount of Golang knowledge would think like this. But Channels in Golang are different from a work queue system. In the work queue system like above, there are TCP connections to the channels, but in Go, the channel is a data structure or even a design pattern, which we’ll explain later. So, what are the channels in Golang exactly?
Channels are the medium through which goroutines can communicate with each other. In simple terms, a channel is a pipe that allows a goroutine to either put or read the data.
What are goroutines?
So, a channel is a communication medium for goroutines. Now, let’s give a quick overview of what goroutines are. If you know this already, feel free to skip this section.
Technically, a goroutine is a function that executes independently in a concurrent fashion. In simple terms, it's a lightweight thread that’s managed by go runtime.
You can create a goroutine by using a Go keyword before a function call.
Let’s say there’s a function called PrintHello, like this:
You can make this into a goroutine simply by calling this function, as below:
Now, let’s head back to channels, as that’s the important topic of this blog.
How to define a channel?
Let’s see a syntax that will declare a channel. We can do so by using the chan keyword provided by Go.
You must specify the data type as the channel can handle data of the same data type.
Very simple! But this is not useful since it would create a Nil channel. Let’s print it and see.
As you can see, we have just declared the channel, but we can’t transport data through it. So, to create a useful channel, we must use the make function.
As you may notice here, the value of c is a memory address. Keep in mind that channels are nothing but pointers. That’s why we can pass them to goroutines, and we can easily put the data or read the data. Now, let’s quickly see how to read and write the data to a channel.
Read and write operations on a channel:
Go provides an easy way to read and write data to a channel by using the left arrow.
This is a simple syntax to put the value in our created channel. The same syntax is used to define the “send” only type of channels.
And to get/read the data from channel, we do this:
This is also the way to define the “receive” only type of channels.
Let’s see a simple program to use the channels.
This simple function just prints whatever data is in the channel. Now, let’s see the main function that will push the data into the channel.
This yields to the output:
Let’s talk about the execution of the program.
1. We declared a printChannelData function, which accepts a channel c of data type integer. In this function, we are just reading data from channel c and printing it.
2. Now, this method will first print “main started...” to the console.
3. Then, we have created the channel c of data type integer using the make keyword.
4. We now pass the channel to the function printChannelData, and as we saw earlier, it’s a goroutine.
5. At this point, there are two goroutines. One is the main goroutine, and the other is what we have declared.
6. Now, we are putting 10 as data in the channel, and at this point, our main goroutine is blocked and waiting for some other goroutine to read the data. The reader, in this case, is the printChannelData goroutine, which was previously blocked because there was no data in the channel. Now that we’ve pushed the data onto the channel, the Go scheduler (more on this later in the blog) now schedules printChannelData goroutine, and it will read and print the value from the channel.
7. After that, the main goroutine again activates and prints “main ended…” and the program stops.
So, what’s happening here? Basically, blocking and unblocking operations are done over goroutines by the Go scheduler. Unless there's data in a channel you can’t read from it, which is why our printChannelData goroutine was blocked in the first place, the written data has to be read first to resume further operations. This happened in case of our main goroutine.
With this, let’s see how channels operate internally.
Internals of channels:
Until now, we have seen how to define a goroutine, how to declare a channel, and how to read and write data through a channel with a very simple example. Now, let’s look at how Go handles this blocking and unblocking nature internally. But before that, let’s quickly see the types of channels.
Types of channels:
There are two basic types of channels: buffered channels and unbuffered channels. The above example illustrates the behaviour of unbuffered channels. Let’s quickly see the definition of these:
- Unbuffered channel: This is what we have seen above. A channel that can hold a single piece of data, which has to be consumed before pushing other data. That’s why our main goroutine got blocked when we added data into the channel.
- Buffered channel: In a buffered channel, we specify the data capacity of a channel. The syntax is very simple. c := make(chan int,10) the second argument in the make function is the capacity of a channel. So, we can put up to ten elements in a channel. When the capacity is full, then that channel would get blocked so that the receiver goroutine can start consuming it.
Properties of a channel:
A channel does lot of things internally, and it holds some of the properties below:
- Channels are goroutine-safe.
- Channels can store and pass values between goroutines.
- Channels provide FIFO semantics.
- Channels cause goroutines to block and unblock, which we just learned about.
As we see the internals of a channel, you’ll learn about the first three properties.
As we learned in the definition, a channel is data structure. Now, looking at the properties above, we want a mechanism that handles goroutines in a synchronized manner and with a FIFO semantics. This can be solved using a queue with a lock. So, the channel internally behaves in that fashion. It has a circular queue, a lock, and some other fields.
When we do this c := make(chan int,10) Go creates a channel using hchan struct, which has the following fields:
(Above info taken from Golang.org]
This is what a channel is internally. Let’s see one-by-one what these fields are.
qcount holds the count of items/data in the queue.
dataqsize is the size of a circular queue. This is used in case of buffered channels and is the second parameter used in the make function.
elemsize is the size of a channel with respect to a single element.
buf is the actual circular queue where the data is stored when we use buffered channels.
closed indicates whether the channel is closed. The syntax to close the channel is close(<channel_name>). The default value of this field is 0, which is set when the channel gets created, and it’s set to 1 when the channel is closed.
sendx and recvx indicates the current index of a buffer or circular queue. As we add the data into the buffered channel, sendx increases, and as we start receiving, recvx increases.
recvq and sendq are the waiting queue for the blocked goroutines that are trying to either read data from or write data to the channel.
lock is basically a mutex to lock the channel for each read or write operation as we don’t want goroutines to go into deadlock state.
These are the important fields of a hchan struct, which comes into the picture when we create a channel. This hchan struct basically resides on a heap and the make function gives us a pointer to that location. There’s another struct known as sudog, which also comes into the picture, but we’ll learn more about that later. Now, let’s see what happens when we write and read the data.
Read and write operations on a channel:
We are considering buffered channels in this. When one goroutine, let’s say G1, wants to write the data onto a channel, it does following:
- Acquire the lock: As we saw before, if we want to modify the channel, or hchan struct, we must acquire a lock. So, G1 in this case, will acquire a lock before writing the data.
- Perform enqueue operation: We now know that buf is actually a circular queue that holds the data. But before enqueuing the data, goroutine does a memory copy operation on the data and puts the copy into the buffer slot. We will see an example of this.
- Release the lock: After performing an enqueue operation, it just releases the lock and goes on performing further executions.
When goroutine, let’s say G2, reads the above data, it performs the same operation, except instead of enqueue, it performs dequeue while also performing the memory copy operation. This states that in channels there’s no shared memory, so the goroutines only share the hchan struct, which is protected by mutex. Others are just copies of memory.
This satisfies the famous Golang quote: “Do not communicate by sharing memory instead share memory by communicating.”
Now, let’s look at a small example of this memory copy operation.
And the output of this is:
So, as you can see, we have added the value of variable a into the channel, and we modify that value before the channel can access it. However, the value in the channel stays the same, i.e., 10. Because here, the main goroutine has performed a memory copy operation before putting the value onto the channel. So, even if you change the value later, the value in the channel does not change.
Write in case of buffer overflow:
We’ve seen that the Go routine can add data up to the buffer capacity, but what happens when the buffer capacity is reached? When the buffer has no more space and a goroutine, let’s say G1, wants to write the data, the go scheduler blocks/pauses G1, which will wait until a receive happens from another goroutine, say G2. Now, since we are talking about buffer channels, when G2 consumes all the data, the Go scheduler makes G1 active again and G2 pauses. Remember this scenario, as we’ll use G1 and G2 frequently here onwards.
We know that goroutine works in a pause and resume fashion, but who controls it? As you might have guessed, the Go scheduler does the magic here. There are few things that the Go scheduler does and those are very important considering the goroutines and channels.
Go Runtime Scheduler
You may already know this, but goroutines are user-space threads. Now, the OS can schedule and manage threads, but it’s overhead to the OS, considering the properties that threads carry.
That’s why the Go scheduler handles the goroutines, and it basically multiplexes the goroutines on the OS threads. Let's see how.
There are scheduling models, like 1:1, N:1, etc., but the Go scheduler uses the M:N scheduling model.
Basically, this means that there are a number of goroutines and OS threads, and the scheduler basically schedules the M goroutines on N OS threads. For example:
OS Thread 1:
OS Thread 2:
As you can see, there are two OS threads, and the scheduler is running six goroutines by swapping them as needed. The Go scheduler has three structures as below:
- M: M represents the OS thread, which is entirely managed by the OS, and it’s similar to POSIX thread. M stands for machine.
- G: G represents the goroutine. Now, a goroutine is a resizable stack that also includes information about scheduling, any channel it’s blocked on, etc.
- P: P is a context for scheduling. This is like a single thread that runs the Go code to multiplex M goroutines to N OS threads. This is important part, and that’s why P stands for processor.
Diagrammatically, we can represent the scheduler as:
(This diagram is referenced from The Go scheduler]
The P processor basically holds the queue of runnable goroutines—or simply run queues.
So, anytime the goroutine (G) wants to run it on a OS thread (M), that OS thread first gets hold of P i.e., the context. Now, this behaviour occurs when a goroutine needs to be paused and some other goroutines must run. One such case is a buffered channel. When the buffer is full, we pause the sender goroutine and activate the receiver goroutine.
Imagine the above scenario: G1 is a sender that tries to send a full buffered channel, and G2 is a receiver goroutine. Now, when G1 wants to send a full channel, it calls into the runtime Go scheduler and signals it as gopark. So, now scheduler, or M, changes the state of G1 from running to waiting, and it will schedule another goroutine from the run queue, say G2.
This transition diagram might help you better understand:
As you can see, after the gopark call, G1 is in a waiting state and G2 is running. We haven’t paused the OS thread (M); instead, we’ve blocked the goroutine and scheduled another one. So, we are using maximum throughput of an OS thread. The context switching of goroutine is handled by the scheduler (P), and because of this, it adds complexity to the scheduler.
This is great. But how do we resume G1 now because it still wants to add the data/task on a channel, right? So, before G1 sends the gopark signal, it actually sets a state of itself on a hchan struct, i.e., our channel in the sendq field. Remember the sendq and recvq fields? They’re waiting senders and receivers.
Now, G1 stores the state of itself as a sudog struct. A sudog is simply a goroutine that is waiting on an element. The sudog struct has these elements:
g is a waiting goroutine, next and prev are the pointers to sudog/goroutine respectively if there’s any next or previous goroutine present, and elem is the actual element it’s waiting on.
So, considering our example, G1 is basically waiting to write the data so it will create a state of itself, which we’ll call sudog as below:
Cool. Now we know, before going into the waiting state, what operations G1 performs. Currently, G2 is in a running state, and it will start consuming the channel data.
As soon as it receives the first data/task, it will check the waiting goroutine in the sendq attribute of an hchan struct, and it will find that G1 is waiting to push data or a task. Now, here is the interesting thing: G2 will copy that data/task to the buffer, and it will call the scheduler, and the scheduler will put G1 from the waiting state to runnable, and it will add G1 to the run queue and return to G2. This call from G2 is known as goready, and it will happen for G1. Impressive, right? Golang behaves like this because when G1 runs, it doesn’t want to hold onto a lock and push the data/task. That extra overhead is handled by G2. That’s why the sudog has the data/task and the details for the waiting goroutine. So, the state of G1 is like this:
As you can see, G1 is placed on a run queue. Now we know what’s done by the goroutine and the go scheduler in case of buffered channels. In this example, the sender gorountine came first, but what if the receiver goroutine comes first? What if there’s no data in the channel and the receiver goroutine is executed first? The receiver goroutine (G2) will create a sudog in recvq on the hchan struct. Things are a little twisted when G1 goroutine activates. It will now see whether there are any goroutines waiting in the recvq, and if there is, it will copy the task to the waiting goroutine’s (G2) memory location, i.e., the elem attribute of the sudog.
This is incredible! Instead of writing to the buffer, it will write the task/data to the waiting goroutine’s space simply to avoid G2’s overhead when it activates. We know that each goroutine has its own resizable stack, and they never use each other’s space except in case of channels. Until now, we have seen how the send and receive happens in a buffered channel.
This may have been confusing, so let me give you the summary of the send operation.
Summary of a send operation for buffered channels:
- Acquire lock on the entire channel or the hchan struct.
- Check if there’s any sudog or a waiting goroutine in the recvq. If so, then put the element directly into its stack. We saw this just now with G1 writing to G2’s stack.
- If recvq is empty, then check whether the buffer has space. If yes, then do a memory copy of the data.
- If the buffer is full, then create a sudog under sendq of the hchan struct, which will have details, like a currently executing goroutine and the data to put on the channel.
We have seen all the above steps in detail, but concentrate on the last point.
It’s kind of similar to an unbuffered channel. We know that for unbuffered channels, every read must have a write operation first and vice versa.
So, keep in mind that an unbuffered channel always works like a direct send. So, a summary of a read and write operation in unbuffered channel could be:
- Sender first: At this point, there’s no receiver, so the sender will create a sudog of itself and the receiver will receive the value from the sudog.
- Receiver first: The receiver will create a sudog in recvq, and the sender will directly put the data in the receiver’s stack.
With this, we have covered the basics of channels. We’ve learned how read and write operates in a buffered and unbuffered channel, and we talked about the Go runtime scheduler.
Channels is a very interesting Golang topic. They seem to be difficult to understand, but when you learn the mechanism, they’re very powerful and help you to achieve concurrency in applications. Hopefully, this blog helps your understanding of the fundamental concepts and the operations of channels.