Project Scheduling: An Exploration of PERT and CPM
The Statement of Work (SOW) is a formal document that defines the scope of the work to be performed in a project. It outlines the project objectives, deliverables, milestones, timeline, resources, and responsibilities of all parties involved. It serves as a contract between the client and the service provider, ensuring that both parties have a clear understanding of the project requirements and expectations.
When the SOW is decomposed from high-level objectives into actionable tasks, PERT (Program Evaluation and Review Technique) and CPM (Critical Path Method), which are project management techniques, are used to schedule, organize, and manage tasks within a project. The SOW defines the scope of work and project requirements; PERT and CPM are used in planning and optimizing project schedules, identifying critical tasks, and managing project timelines. The SOW provides the foundation for PERT and CPM by defining the tasks and activities to be included in the project schedule.
From PERT and/or CPM analysis, project managers acquire valuable insights including task durations, dependencies, and the critical path. These findings serve as inputs for creating Gantt charts. Task durations and dependencies identified through PERT and CPM methodologies are transformed into visual representations within Gantt charts, often generated using Microsoft Project. Within this software, project managers can efficiently generate, customize, and manage Gantt charts to depict task sequences, durations, dependencies, milestones, and resource allocations.
This subject matter of this article is on PERT and CPM. The article delves into their origins, development, comparison, implementation in Python, and integration with Excel. The comparison between PERT and CPM encompasses their purposes, focuses, calculation methods, use of probabilities, and complexity.
The subsequent sections illustrate how to create project networks, execute forward and backward pass calculations, identify critical paths, and visualize project networks using Python code. Additionally, the article outlines the integration of Python-based PERT/CPM calculations with Excel. It includes an example demonstrating the generation of sample data and the population of an Excel file with task durations.
Evolution and Origins
PERT and CPM are both project management techniques used to schedule, organize, and manage tasks within a project. They both originated in the late 1950s and early 1960s in the United States. They emerged independently from different organizations and industries, but they share common principles and objectives. Both methodologies aimed to address the challenges of managing complex projects by providing tools and techniques for scheduling, organizing, and controlling project activities. Over time, PERT and CPM have been widely adopted across various industries and have become fundamental tools in project management practice.
PERT was developed by the United States Navy Special Projects Office in collaboration with the consulting firm Booz Allen Hamilton and the management consulting division of Lockheed Corporation. It originated in response to the need to manage the Polaris missile submarine program, which was a complex and highly uncertain project. PERT was first implemented in 1958 to support the development of the Polaris missile system. The primary challenge was to manage the scheduling and coordination of numerous tasks involved in the project, considering the uncertainty in task durations due to technological complexity and developmental risks. The key contributors to PERT included researchers from the Navy, Booz Allen Hamilton, and Lockheed Corporation. Notable individuals involved in its development include Admiral William A. Raborn Jr., Fred T. Wilkes, and Morgan R. Walker.
CPM was developed independently by DuPont Corporation and Remington Rand (later acquired by IBM). It emerged from the need to manage large-scale construction projects more effectively. CPM originated in the late 1950s, with its first application in the construction of chemical plants by DuPont. The primary objective was to optimize the scheduling of tasks to minimize project duration and costs. The key developers of CPM were James E. Kelley Jr. and Morgan R. Walker at DuPont, who formulated the method in 1957, and later independently by Morgan R. Walker at Remington Rand. Later, Kelley and Walker collaborated on refining and disseminating the CPM methodology.
Comparison
PERT and CPM share similarities and are often used together, they have distinct differences in regards to purpose, focus, calculation method, use of probabilities, and complexity.
PERT is primarily used in projects with uncertain or variable time estimates for each activity. It is useful for projects with high uncertainty or complexity. CPM, on the other hand, is used when time estimates for each activity are known with certainty. CPM is more suitable for projects with well-defined tasks and durations.
PERT focuses on estimating the time required to complete each task within a project, taking into account uncertainty and variability. It utilizes three time estimates for each activity: optimistic (O), pessimistic (P), and most likely (M). CPM focuses on identifying the critical path in a project, which is the longest sequence of dependent tasks that determines the overall duration of the project. CPM does not account for uncertainty in task durations but rather focuses on the logical sequence of tasks.
In PERT, the expected duration of each activity is calculated using the formula: (O + 4M + P) / 6, where O is the optimistic time estimate, M is the most likely time estimate, and P is the pessimistic time estimate. In CPM, the focus is on determining the earliest start time (ES), earliest finish time (EF), latest start time (LS), latest finish time (LF), and float (slack) for each activity. This is accomplished through forward and backward pass calculations within the network diagram.
PERT uses probability distributions to model the uncertainty in task durations and calculates the probability of completing the project within a certain time frame, while CPM does not directly incorporate probabilities into its calculations. CPM assumes deterministic task durations. PERT tends to be more complex due to its consideration of uncertainty and utilization of probability distributions, whereas CPM is simpler and more straightforward, focusing on the logical sequence of tasks and their durations.
In practice, PERT and CPM are often used together, especially in larger projects. PERT can be employed initially to estimate durations and evaluate project risks, while CPM can be utilized to formulate a comprehensive schedule based on those estimates and pinpoint the critical path.
Python Examples
PERT and CPM methodologies can be implemented using common tools such as Excel or programming languages like Python as shown on the approach below, which utilizes: (1) Python Libraries, (2) Network Representation, (3) Calculations, (4) Visualization, and (5) Integration with Excel.
Python Libraries
Python offers various libraries that can be used for project management tasks. For example, NetworkX can be used to create and analyze project networks, while numpy and pandas can handle calculations and data manipulation. Below are the instructions for utilizing the mentioned Python libraries for project management tasks:
NetworkX
NumPy and Pandas
The following is Python code combining these libraries for project management:
import networkx as nx
import numpy as np
import pandas as pd
# Create a project network using NetworkX
G = nx.DiGraph()
G.add_node("Task1")
G.add_node("Task2")
G.add_node("Task3")
G.add_edge("Task1", "Task2")
G.add_edge("Task1", "Task3")
# Define task durations using NumPy arrays
task_durations = {"Task1": 5, "Task2": 3, "Task3": 4} # Using a dictionary for simplicity
df_task_durations = pd.DataFrame({"Task": list(task_durations.keys()), "Duration": list(task_durations.values())})
# Calculate project schedule
project_schedule = df_task_durations["Duration"].sum()
# Print project schedule
print("Project Schedule:", project_schedule)
Network Representation
The project can be represented as a network or graph in Python using NetworkX or similar libraries. Nodes represent tasks, and edges represent dependencies between tasks. The following instructions incorporate the use of Matplotlib for visualization.
Visualization
The following example includes Matplotlib for visualization:
import networkx as nx
import matplotlib.pyplot as plt
# Create a directed graph to represent the project network
G = nx.DiGraph()
# Add nodes representing tasks
G.add_node("Task1")
G.add_node("Task2")
G.add_node("Task3")
# Add edges representing dependencies between tasks
G.add_edge("Task1", "Task2")
G.add_edge("Task1", "Task3")
# Visualize the project network using NetworkX and Matplotlib
nx.draw(G, with_labels=True)
plt.show()
The image below is the project network visualization generated by using NetworkX, use Matplotlib.
Calculations
Once the project has been represented as a network, calculations can be performed to determine critical paths, task durations, and other metrics. It will be necessary to implement algorithms for forward pass, backward pass, and identifying critical paths.
Forward Pass
Implement the forward pass algorithm to calculate the earliest start time (ES) and earliest finish time (EF) for each task in the project. Start by setting the ES of the initial task(s) to zero. For each subsequent task, calculate its ES as the maximum EF of its preceding tasks. Calculate the EF for each task as the sum of its ES and duration. Repeat this process until all tasks have been processed.
Backward Pass
Implement the backward pass algorithm to calculate the latest start time (LS) and latest finish time (LF) for each task in the project. Start by setting the LF of the final task(s) to their EF calculated in the forward pass. For each preceding task, calculate its LF as the minimum LS of its succeeding tasks. Calculate the LS for each task as the difference between its LF and duration. Repeat this process until all tasks have been processed.
Critical Path Identification
Once you have calculated the ES, EF, LS, and LF for each task, identify the critical path(s) in the project. The critical path is the longest path(s) through the project network, i.e., the path(s) with zero slack. Tasks with zero slack have their ES equal to their LS and their EF equal to their LF. Any delay in tasks on the critical path will directly impact the project's overall duration.
Task Durations and Metrics
With the ES, EF, LS, and LF calculated for each task, you can determine task durations, total project duration, and other metrics. Task duration is simply the difference between EF and ES for each task. Total project duration is the maximum EF of all tasks (or the maximum LF if using LS and LF). You can also calculate slack for each task (slack = LS - ES or slack = LF - EF), which indicates the flexibility in task scheduling without impacting project duration.
Implementation
Implement these calculations in your Python script by traversing the project network and updating task attributes accordingly. You can use NetworkX functions to iterate over nodes and edges in the graph and access task attributes stored as node attributes. Verify your calculations by comparing results with manual calculations or established project management methodologies.
By implementing these algorithms, you can effectively analyze your project network, identify critical paths, determine task durations, and assess project metrics to manage and optimize project schedules.
The following code implements the forward pass, backward pass, and critical path identification algorithms.
import networkx as nx
# Create a directed graph to represent the project network
G = nx.DiGraph()
# Add nodes representing tasks with their durations
tasks = {
"Task1": {"duration": 5},
"Task2": {"duration": 3},
"Task3": {"duration": 4},
"Task4": {"duration": 2},
"Task5": {"duration": 6},
"Task6": {"duration": 3}
}
# Add nodes with durations to the graph
for task, attr in tasks.items():
G.add_node(task, duration=attr["duration"])
# Add edges representing dependencies between tasks
G.add_edge("Task1", "Task2")
G.add_edge("Task1", "Task3")
G.add_edge("Task2", "Task4")
G.add_edge("Task3", "Task4")
G.add_edge("Task3", "Task5")
G.add_edge("Task4", "Task6")
G.add_edge("Task5", "Task6")
# Perform forward pass
for node in nx.topological_sort(G):
es = max([0] + [G.nodes[predecessor]['EF'] for predecessor in G.predecessors(node)], default=0)
G.nodes[node]['ES'] = es
G.nodes[node]['EF'] = es + G.nodes[node]['duration']
# Perform backward pass
for node in reversed(list(nx.topological_sort(G))):
lf = min([G.nodes[successor]['LS'] for successor in G.successors(node)], default=G.nodes[node]['EF'])
G.nodes[node]['LF'] = lf
G.nodes[node]['LS'] = lf - G.nodes[node]['duration']
# Identify critical path
critical_path = [node for node in G.nodes if G.nodes[node]['ES'] == G.nodes[node]['LS']]
# Print task durations, ES, EF, LS, LF, and critical path
print("Task\tDuration\tES\tEF\tLS\tLF")
for node in G.nodes:
print(f"{node}\t{G.nodes[node]['duration']}\t\t{G.nodes[node]['ES']}\t{G.nodes[node]['EF']}\t{G.nodes[node]['LS']}\t{G.nodes[node]['LF']}")
print("Critical Path:", critical_path)
The forward pass is performed in the first loop after the graph creation. It traverses the graph in topological order using nx.topological_sort(G). For each node in the topological order, it calculates the earliest start time (ES) based on the maximum EF of its predecessors. The calculated ES is stored in the node attribute 'ES', and the earliest finish time (EF) is calculated as the sum of ES and duration.
The backward pass is performed in the second loop, which iterates over the topological order in reverse. For each node, it calculates the latest finish time (LF) based on the minimum LS of its successors. The latest start time (LS) is then calculated, and the values are stored in the respective node attributes 'LS' and 'LF'.
After performing both passes, the critical path is identified by checking nodes where ES equals LS, indicating zero slack. These nodes are appended to the critical_path list.
The code then prints task durations, ES, EF, LS, LF, and the critical path.
Task Duration ES EF LS LF
Task1 5 0 5 0 5
Task2 3 5 8 10 13
Task3 4 5 9 5 9
Task4 2 9 11 13 15
Task5 6 9 15 9 15
Task6 3 15 18 15 18
Critical Path: ['Task1', 'Task3', 'Task5', 'Task6']
Visualization
The project network and critical paths can be visualized using libraries like matplotlib or plotly. The following code uses Matplotlib to visualize the project network and highlight the critical path.
import networkx as nx
import matplotlib.pyplot as plt
# Create a directed graph to represent the project network
G = nx.DiGraph()
# Add nodes representing tasks with their durations
tasks = {
"Task1": {"duration": 5},
"Task2": {"duration": 3},
"Task3": {"duration": 4},
"Task4": {"duration": 2},
"Task5": {"duration": 6},
"Task6": {"duration": 3}
}
# Add nodes with durations to the graph
for task, attr in tasks.items():
G.add_node(task, duration=attr["duration"])
# Add edges representing dependencies between tasks
G.add_edge("Task1", "Task2")
G.add_edge("Task1", "Task3")
G.add_edge("Task2", "Task4")
G.add_edge("Task3", "Task4")
G.add_edge("Task3", "Task5")
G.add_edge("Task4", "Task6")
G.add_edge("Task5", "Task6")
# Perform forward pass
for node in nx.topological_sort(G):
es = max([0] + [G.nodes[predecessor]['EF'] for predecessor in G.predecessors(node)], default=0)
G.nodes[node]['ES'] = es
G.nodes[node]['EF'] = es + G.nodes[node]['duration']
# Perform backward pass
for node in reversed(list(nx.topological_sort(G))):
lf = min([G.nodes[successor]['LS'] for successor in G.successors(node)], default=G.nodes[node]['EF'])
G.nodes[node]['LF'] = lf
G.nodes[node]['LS'] = lf - G.nodes[node]['duration']
# Identify critical path
critical_path = [node for node in G.nodes if G.nodes[node]['ES'] == G.nodes[node]['LS']]
# Visualize the project network
pos = nx.spring_layout(G) # Define layout for the nodes
nx.draw(G, pos, with_labels=True, node_size=800, node_color='skyblue', font_size=10, font_weight='bold') # Draw nodes
nx.draw_networkx_edges(G, pos, width=1.0, alpha=0.5) # Draw edges
nx.draw_networkx_nodes(G, pos, nodelist=critical_path, node_color='r', node_size=800) # Highlight critical path nodes in red
plt.title('Project Network') # Set title for the plot
plt.show() # Show the plot
In this code nx.spring_layout() is used to define the layout for the nodes. nx.draw() is used to draw the nodes with labels, and nx.draw_networkx_edges() is used to draw the edges. The critical path nodes are highlighted in red using nx.draw_networkx_nodes(). As shown on the below image, this code displays a visualization of the project network with critical paths highlighted in red.
Integration with Excel
Python libraries like openpyxl or xlrd can be utilized to read from and write to Excel files, allowing the integration of the Python-based PERT/CPM calculations with Excel for visualization or further analysis.
The following is a guide that outlines a step-by-step process for integrating Python-based PERT/CPM calculations with Excel using libraries such as openpyxl or xlrd.
By following this guide, you can effectively combine the capabilities of Python for complex calculations with Excel's powerful visualization and analysis features for efficient project management and analysis.
Excel Examples
Below is a Python code example to generate sample data and populate an Excel file (input.xlsx) with tasks and their durations. You can then use the data in this Excel file as input for PERT/CPM calculations.
import openpyxl
import random
# Function to generate sample data
def generate_sample_data(num_tasks):
data = [["Task", "Duration"]]
for i in range(1, num_tasks + 1):
task_name = f"Task{i}"
duration = random.randint(1, 10) # Random duration between 1 and 10 days
data.append([task_name, duration])
return data
# Function to write data to Excel file
def write_data_to_excel(file_path, data):
workbook = openpyxl.Workbook()
worksheet = workbook.active
for row_idx, row in enumerate(data, start=1):
for col_idx, value in enumerate(row, start=1):
worksheet.cell(row=row_idx, column=col_idx, value=value)
workbook.save(file_path)
# Example usage
if __name__ == "__main__":
num_tasks = 10 # Number of tasks
data = generate_sample_data(num_tasks)
write_data_to_excel("input.xlsx", data)
print(f"Sample data for {num_tasks} tasks has been written to input.xlsx.")
In this code:
Adjust the num_tasks variable to specify the desired number of tasks. After running this code, you'll have an Excel file (input.xlsx) populated with sample task data.
The below code demonstrates how to read data from an Excel file, perform PERT/CPM calculations, and write the results back to another Excel file.
import openpyxl
# Function to perform PERT/CPM calculations
def perform_pert_cpm(tasks, predecessors, successors):
es = {}
ef = {}
ls = {}
lf = {}
# Calculate ES and EF for each task
for task, duration in tasks.items():
es[task] = max([ef.get(pred, 0) for pred in predecessors.get(task, [])], default=0)
ef[task] = es[task] + duration
# Calculate LF and LS for each task in reverse topological order
for task in reversed(topological_order):
lf[task] = min([ls.get(succ, ef[task]) for succ in successors.get(task, [])], default=ef[task])
ls[task] = lf[task] - tasks[task]
return es, ef, ls, lf
# Load data from Excel file
def load_data_from_excel(file_path):
workbook = openpyxl.load_workbook(file_path)
worksheet = workbook.active
tasks = {}
predecessors = {}
successors = {}
topological_order = []
# Skip header row
iter_rows = iter(worksheet.iter_rows(values_only=True))
next(iter_rows)
# Read task data from Excel
for row in iter_rows:
task = row[0]
duration = int(row[1]) # Convert duration to integer
dependencies = row[2:]
tasks[task] = duration
if dependencies:
predecessors[task] = dependencies
for dep in dependencies:
if dep not in successors:
successors[dep] = []
successors[dep].append(task)
# Generate topological order
for task in tasks:
if task not in predecessors:
topological_order.append(task)
return tasks, predecessors, successors, topological_order
# Write results to Excel file
def write_results_to_excel(file_path, es, ef, ls, lf):
workbook = openpyxl.Workbook()
worksheet = workbook.active
worksheet.append(["Task", "Duration", "ES", "EF", "LS", "LF"])
for task in es:
worksheet.append([task, tasks[task], es[task], ef[task], ls[task], lf[task]])
workbook.save(file_path)
# Example usage
if __name__ == "__main__":
input_file = 'input.xlsx'
tasks, predecessors, successors, topological_order = load_data_from_excel(input_file)
es, ef, ls, lf = perform_pert_cpm(tasks, predecessors, successors)
write_results_to_excel('output.xlsx', es, ef, ls, lf)
print("PERT/CPM calculations completed and results written to output.xlsx.")
Conclusion
This article provides an overview of PERT and CPM. It delineates the SOW's role in defining project scope, objectives, and responsibilities, while detailing PERT and CPM as project scheduling and management tools. PERT is typically employed when dealing with uncertain time estimates, while CPM is more commonly used in projects with well-defined timeframes. The article delves into the origins, comparison, implementation, and integration of both methodologies with Python and Excel. Through Python libraries like NetworkX, NumPy, and Pandas, project management tasks are executed, visualized, and analyzed, with further exploration into integrating these processes with Excel for enhanced visualization and analysis. Python code snippets demonstrate tasks ranging from generating sample data to performing PERT/CPM calculations and reading and writing results to Excel, ultimately streamlining project management and decision-making processes.