Topological Sorting Explained: Sorting Dependency Chains

Sep 27, 2024· 9 min read
Topological Sorting Explained: Sorting Dependency Chains
type
status
date
summary
tags
category
icon
password
featured
freq
difficulty
Topological Sort was introduced to solve problems that involve ordering tasks or events with dependencies. In many real-world and computational systems — like build processes, course prerequisites, or data pipelines — certain tasks must happen before others. Topological sorting provides a systematic way to produce this dependency-respecting linear order in a directed acyclic graph (DAG).
Formally, it finds a linear ordering of vertices in a DAG such that for every directed edge u → v, node u comes before node v in the ordering. The core requirement is that the graph must be acyclic; if a cycle exists (like A → B → C → A), a valid topological sort is impossible.
Topological sort is most commonly used for problems involving task dependencies — such as scheduling build processes, determining course prerequisites, or resolving software package dependencies. There are two canonical approaches to compute it: Kahn’s Algorithm (BFS-based) and a DFS-based method.

Kahn’s Algorithm (BFS-Based)

Kahn’s Algorithm uses a queue to process nodes with zero in-degree (no prerequisites). It works on graphs structured as { prerequisite: [dependents] }. It works by repeatedly remove nodes with in-degree = 0 (no prerequisites), and decrement their children’s in-degrees.
Steps
  1. Initialization
      • Compute in-degree for every node.
      • Push all nodes with in-degree == 0 into a queue.
  1. Process queue
      • Pop a node → append to result.
      • For each child, decrement its in-degree; if it hits zero, enqueue it.
  1. Output:
      • Return topo_order if it includes all nodes.
      • Otherwise, detect a cycle.
Time Complexity:
  • Initialization: Calculating num_parents (in-degrees) takes O(V+E), where V is the number of nodes, and E is the number of edges.
  • Processing Nodes: Each node is processed once, and each edge is visited once, contributing O(V+E).
  • Overall Time Complexity: O(V+E).
Space Complexity:
  • num_parents and ready_queue: Require O(V) space.
  • Graph Representation: Requires O(V+E) space for adjacency list.
  • Overall Space Complexity: O(V+E).
 

DFS-Based Topological Sort

The DFS-based topological sort works by performing a post-order traversal. It works with either graphs structured with forward edges as graph ={dependent: prerequisites} or graph = { prerequisite: [dependents] }. It ensures a node is added to the resulting list only after all its dependent nodes (when traversing forward) or prerequisite nodes (when traversing backward) have been fully visited and processed. By collecting nodes upon the return of all recursive calls, this method guarantees that every node appears after all its dependencies have been listed, yielding the topological order.
 
notion image

prereq -> dependents (forward edges, like in Kahn's algorithm).

1. When dfs(A) is called, it immediately calls dfs(B), dfs(C), etc., where A is the prerequisite, B and C are its dependents.
2. The node A is appended to the topo_order list last, after all its dependent nodes (B, C, etc.) have finished processing.
3. This means the node that has the most prerequisites satisfied earliest is at the end of the list. That is, the list is built in reverse topological order, and we must call topo_order.reverse() at the end.
Time Complexity:
  • DFS Traversal: O(V+E), where V is the number of vertices and E is the number of edges.
  • Each edge is processed once, and each node is visited once.
Space Complexity:
  • Graph storage: O(V+E) for the adjacency list.
  • Recursion stack: O(V) in the worst case.
  • visited and visiting dictionaries: O(V).
    •  

dependent -> prerequisites

A node can only be added to the final ordering after all of its prerequisites nodes have been visited and processed (i.e., added to the list). By inserting a node into the result list only after the recursive calls to all of its prerequisites/neighbors return, we ensure all prerequisites come before the node itself.
Steps:
  1. Traversal: Use a recursive Depth-First Search (DFS) to explore the graph.
  1. Cycle Detection: Employ two sets (visiting and visited) to detect cycles during traversal:
      • visiting: Nodes currently in the recursion stack (a back-edge to a visiting node means a cycle).
      • visited: Nodes completely processed (all children visited).
  1. Ordering: A node is appended to the topo_order list only after the DFS returns from all of its prerequisites/neighbors.

Comparison

Both methods are equally efficient in terms of theoretical complexity, offering a linear-time solution proportional to the size of the graph.
Kahn’s Algorithm (BFS)
DFS-Based Approach
Technique
Iterative: In-degree tracking + queue
Recursive: Postorder DFS + reverse (with visiting cycle check)
Graph orientation
prereq → dependent
Either prereq → dependent or dependent → prereq
Cycle detection
If not all nodes removed
Hitting a gray (visiting) node
Time / Space
O(V+E) / O(V+E)
O(V+E) / O(V+E)

Example: Course Schedule II

The problem gives an array prerequisites where prerequisites[i] = [ai, bi] indicates that you must take course bi first if you want to take course ai. And it asks for the ordering of courses you should take to finish all courses.

DFS Solution

Each course is a node; prerequisites form edges and build the graph prereq -> dependents.
Detect cycles and build the order using post-order DFS.

Kahn’s Algorithm (BFS)

Track in-degrees and process the courses with no any prerequisition first.
 

Endnote

Topological sorting elegantly transforms complex dependency graphs into linear execution sequences. Whether using Kahn’s queue-driven approach or DFS’s recursive depth, both methods reveal the underlying structure of directed acyclic graphs and empower solutions in scheduling, compilation, and course planning.
上一篇
Solving Problems with the Two-Pointers Technique
下一篇
Implementing Efficient Prefix Search with Tries