[Eng] Distributed Scheduler The Series: Episode 0 — From Zero to Orchestrator


Why I Started This Series

Just for learning purposes, just curious how to do that, and if I’m still learning, why don’t I just share with others who might not know this as well? (But the truth is, I just wanted to build something in Go 😛)

It Started with a Cron Job

I’ve built many hobby projects—one of those, I think, many of you might have done this before, the web scraper or fetcher or whatever you want to call it. I just wanted to do a simple task: fetch a URL every day at 3 A.M.

You might say:

  • Use a cloud solution! — Then the pricing is the main concern if we run millions of jobs on the cloud.
  • That’s easy — just run a cron with 0 3 * * *, done! — Yes, the easiest way to solve that is to run the cron. It works like a charm, smooth as silk.

But what if you need other jobs that run at different times?

Easy! Just add more crons!

Then, at some point, things start going wrong:

  • Emails aren’t sent to customers.
  • Partner data stops syncing.
  • Reports aren’t generated.
  • Worst of all — I didn’t even notice.

The Limitations of Cron Jobs

As the system grows, we notice that cron has limitations:

  • No monitoring.
  • No retry if failed.
  • No logs — unless you add them manually.
  • No visibility.
  • Hard to keep track of many jobs.
  • Painful to maintain at scale.

Why Go Distributed?

So I thought—can I just build a better scheduler?

But once your system scales, you might get:

  • One machine isn’t enough.
  • You need fault tolerance — jobs still need to run if one server goes down.
  • Different jobs require different resources — CPU-heavy vs memory-heavy.
  • Some jobs need to run closer to the data (geographic requirements).

A single-server scheduler works… until suddenly, it doesn’t. Then you’re back to having a single point of failure.

What’s Next?

I didn’t plan to build a distributed system. But here we are.

In the next episode, we’ll walk through what happens when you try to scale across multiple nodes — and why simple fixes like DB locks fall apart in real-world conditions.

This is the beginning of going from “just a cron job” to building a real orchestrator.

Let’s go and figure out how many episodes we’ll need to build this thing 😂