Supervisor.start_children blocks

Knee deep into working with OTP I came across the following scenario: I had a Supervisor that can dynamically spawn workers and each worker initializes their own state with an expensive function. To speed up the whole process I thought I would just wrap it into a Task.start_link block to start them in parallel. Turns out, and this makes sense if you think about it, you can only add one worker at a time to a Supervisor.

Consider following code:

defmodule MyApp do
  use Application

  def start(_type, _args) do
    import Supervisor.Spec, warn: false

    children = [
      worker(MyApp.Worker, [])
    ]

    opts = [strategy: :simple_one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end

  def run do
    Enum.map(1..10, fn i ->
      Task.start_link(fn ->
        Supervisor.start_child(MyApp.Supervisor, [i])
      end)
    end)
  end
end

with the following worker:

defmodule MyApp.Worker do

  def start_link(i) do
    IO.inspect i
    Agent.start_link(fn -> expensive(i) end)
  end

  def expensive(i) do
    :timer.sleep(1000)
    i
  end

end

If you run this function in IEX you get something like this:

iex(1)> MyApp.run
1
[ok: #PID<0.90.0>, ok: #PID<0.91.0>, ok: #PID<0.92.0>, ok: #PID<0.93.0>,
 ok: #PID<0.94.0>, ok: #PID<0.95.0>, ok: #PID<0.96.0>, ok: #PID<0.97.0>,
 ok: #PID<0.98.0>, ok: #PID<0.99.0>]
iex(2)> 2
3
4
5
...

Where the integers start showing up 1 second apart. (Note: The returned PIDs are from the Task). What is happening is that Supervisor.start_child(MyApp.Supervisor, [i]) blocks until start_link is done returning a {:ok, pid}, before it can allow another child process to be registered.

The solution to this issue is to use a GenServer and to set the state in a async manner using handle_info/2 with init/1. This is the changed code:

defmodule MyApp.Worker do
  use GenServer

  def start_link(i) do
    IO.inspect i
    GenServer.start_link(__MODULE__, i)
  end

  def init(args) do
    send self, :set_init_state
    {:ok, args}
  end

  def handle_info(:set_init_state, i) do
    :timer.sleep(3000)
    {:noreply, i * i}
  end
end

and running IEX again:

iex(1)> MyApp.run
1
2
3
4
5
6
7
8
9
10
[ok: #PID<0.90.0>, ok: #PID<0.91.0>, ok: #PID<0.92.0>, ok: #PID<0.93.0>,
 ok: #PID<0.94.0>, ok: #PID<0.95.0>, ok: #PID<0.96.0>, ok: #PID<0.97.0>,
 ok: #PID<0.98.0>, ok: #PID<0.99.0>]

We can see that the workers all started immediately. But did they in fact change the state? Sure did!

Image of PID with async set state

What is this useful for? Let say you want to start a connection to an external service or want to start more workers as part of that worker in the supervision tree. Either way you don’t want the top supervisor to wait until all the workers are initialized in sequence, especially when you can do it in parallel.