Waterfilling¶

Waterfilling is an energy-guided search algorithm that explores the CNF graph in order of increasing energy. Unlike A*, which seeks a specific goal, waterfilling expands outward from a starting point like water filling a basin, always flowing to the lowest available point. When it reaches a goal, the highest energy encountered along the way gives an upper bound on the minimum energy barrier between the two structures.

This guide covers two implementations: a simple in-memory version for small searches, and a distributed database-backed version for large-scale exploration.

The Waterfilling Concept¶

Imagine the energy landscape of crystal configuration space as a terrain with valleys and ridges. If you pour water at a starting structure (a low point in this terrain), it will fill up the local basin and eventually overflow into neighboring regions through the lowest available saddle point.

Waterfilling simulates this process algorithmically. We maintain a frontier of discovered but unexplored crystal structures, sorted by energy. At each step, we expand the lowest-energy frontier point by finding its neighbors, computing their energies, and adding them to the frontier. When the goal structure appears in the frontier, we know the "water" has reached it, and the maximum energy we explored along the way represents the barrier height.

The algorithm terminates either when the goal is found or when a maximum number of points have been explored. In either case, the highest energy encountered gives an upper bound on the activation barrier.

Local Waterfilling¶

The local implementation is a single-threaded, in-memory algorithm suitable for small to medium searches. It uses a priority queue (min-heap) keyed by energy to efficiently select the lowest-energy frontier point at each step.

In [ ]:

Copied!





from pymatgen.core import Structure, Lattice
from cnf import CNFConstructor
from cnf.calculation.grace import GraceCalculator

# Create two simple Zr structures
a_hcp, c_hcp = 3.23, 5.15
lattice_hcp = Lattice.hexagonal(a_hcp, c_hcp)
zr_hcp = Structure(
    lattice_hcp,
    ["Zr", "Zr"],
    [[0, 0, 0], [1/3, 2/3, 0.5]]
)

# A slightly strained version as the "goal"
lattice_strained = Lattice.hexagonal(a_hcp * 1.03, c_hcp)
zr_strained = Structure(
    lattice_strained,
    ["Zr", "Zr"],
    [[0, 0, 0], [1/3, 2/3, 0.5]]
)

# Convert to CNF
xi = 2.0
delta = 50
constructor = CNFConstructor(xi=xi, delta=delta)
start_cnf = constructor.from_pymatgen_structure(zr_hcp).cnf
goal_cnf = constructor.from_pymatgen_structure(zr_strained).cnf

print(f"Start CNF: {start_cnf.coords}")
print(f"Goal CNF:  {goal_cnf.coords}")
from pymatgen.core import Structure, Lattice
from cnf import CNFConstructor
from cnf.calculation.grace import GraceCalculator

# Create two simple Zr structures
a_hcp, c_hcp = 3.23, 5.15
lattice_hcp = Lattice.hexagonal(a_hcp, c_hcp)
zr_hcp = Structure(
    lattice_hcp,
    ["Zr", "Zr"],
    [[0, 0, 0], [1/3, 2/3, 0.5]]
)

# A slightly strained version as the "goal"
lattice_strained = Lattice.hexagonal(a_hcp * 1.03, c_hcp)
zr_strained = Structure(
    lattice_strained,
    ["Zr", "Zr"],
    [[0, 0, 0], [1/3, 2/3, 0.5]]
)

# Convert to CNF
xi = 2.0
delta = 50
constructor = CNFConstructor(xi=xi, delta=delta)
start_cnf = constructor.from_pymatgen_structure(zr_hcp).cnf
goal_cnf = constructor.from_pymatgen_structure(zr_strained).cnf

print(f"Start CNF: {start_cnf.coords}")
print(f"Goal CNF:  {goal_cnf.coords}")

Local Waterfilling¶

The local implementation is a single-threaded, in-memory algorithm suitable for small to medium searches. It uses a priority queue (min-heap) keyed by energy to efficiently select the lowest-energy frontier point at each step.

In [ ]:

Copied!





from cnf.navigation.waterfill import waterfill

calc = GraceCalculator()

barrier, explored_energies, goal_found = waterfill(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    energy_calc=calc,
    max_iters=500,
)

print(f"\nGoal found: {goal_found is not None}")
print(f"Points explored: {len(explored_energies)}")
print(f"Barrier height: {barrier:.4f} eV")
from cnf.navigation.waterfill import waterfill

calc = GraceCalculator()

barrier, explored_energies, goal_found = waterfill(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    energy_calc=calc,
    max_iters=500,
)

print(f"\nGoal found: {goal_found is not None}")
print(f"Points explored: {len(explored_energies)}")
print(f"Barrier height: {barrier:.4f} eV")

How the Algorithm Works¶

The local waterfill maintains several data structures:

A frontier (min-heap) of (energy, counter, cnf) tuples. The counter is a tiebreaker to ensure stable ordering when energies are equal.
A seen set of all CNFs that have been discovered (either explored or in the frontier). This prevents re-adding points.
A list of explored energies, used to compute the final barrier height.

At each iteration:

Pop the lowest-energy point from the frontier
Find its neighbors in the CNF graph
For each new neighbor not in the seen set, compute its energy and add it to the frontier
If any neighbor is a goal, stop

The barrier height is simply the maximum energy among all explored points.

Optional Features¶

The local waterfill supports several optional features:

Dropout randomly excludes some neighbors from consideration, introducing stochasticity that can help explore different regions of the energy landscape.

In [ ]:

Copied!





# Waterfill with dropout
barrier2, energies2, found2 = waterfill(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    energy_calc=calc,
    max_iters=500,
    dropout=0.1,  # Drop 10% of neighbors randomly
)

print(f"With 10% dropout: {len(energies2)} points explored")
# Waterfill with dropout
barrier2, energies2, found2 = waterfill(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    energy_calc=calc,
    max_iters=500,
    dropout=0.1,  # Drop 10% of neighbors randomly
)

print(f"With 10% dropout: {len(energies2)} points explored")

Filters can restrict which neighbors are considered. For example, a MinDistanceFilter rejects structures where atoms are too close together.

In [ ]:

Copied!





from cnf.navigation.search_filters import FilterSet, MinDistanceFilter

min_dist_filter = MinDistanceFilter(min_dist=2.0)
filter_set = FilterSet([min_dist_filter])

barrier3, energies3, found3 = waterfill(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    energy_calc=calc,
    max_iters=500,
    filter_set=filter_set,
)

print(f"With min_distance filter: {len(energies3)} points explored")
from cnf.navigation.search_filters import FilterSet, MinDistanceFilter

min_dist_filter = MinDistanceFilter(min_dist=2.0)
filter_set = FilterSet([min_dist_filter])

barrier3, energies3, found3 = waterfill(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    energy_calc=calc,
    max_iters=500,
    filter_set=filter_set,
)

print(f"With min_distance filter: {len(energies3)} points explored")

Checkpointing periodically saves the search state to disk, allowing long searches to be resumed after interruption. The implementation uses a fork-based approach: the parent process continues immediately while a child process serializes the state in the background, avoiding blocking on I/O.

In [ ]:

Copied!





import tempfile
import os

# Checkpointing example (writes to temporary directory)
with tempfile.TemporaryDirectory() as tmpdir:
    checkpoint_path = os.path.join(tmpdir, "waterfill_checkpoint.pkl")
    
    barrier4, energies4, found4 = waterfill(
        start_cnfs=[start_cnf],
        goal_cnfs=[goal_cnf],
        energy_calc=calc,
        max_iters=200,
        checkpoint_path=checkpoint_path,
        checkpoint_interval=50,  # Checkpoint every 50 points
    )
    
    print(f"Checkpoint exists: {os.path.exists(checkpoint_path)}")
import tempfile
import os

# Checkpointing example (writes to temporary directory)
with tempfile.TemporaryDirectory() as tmpdir:
    checkpoint_path = os.path.join(tmpdir, "waterfill_checkpoint.pkl")
    
    barrier4, energies4, found4 = waterfill(
        start_cnfs=[start_cnf],
        goal_cnfs=[goal_cnf],
        energy_calc=calc,
        max_iters=200,
        checkpoint_path=checkpoint_path,
        checkpoint_interval=50,  # Checkpoint every 50 points
    )
    
    print(f"Checkpoint exists: {os.path.exists(checkpoint_path)}")

Graph tracking records the full graph structure (node energies and edges) during exploration, useful for visualization or analysis.

In [ ]:

Copied!





barrier5, energies5, found5, node_energies, edge_set = waterfill(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    energy_calc=calc,
    max_iters=100,
    track_graph=True,
)

print(f"Nodes recorded: {len(node_energies)}")
print(f"Edges recorded: {len(edge_set)}")
barrier5, energies5, found5, node_energies, edge_set = waterfill(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    energy_calc=calc,
    max_iters=100,
    track_graph=True,
)

print(f"Nodes recorded: {len(node_energies)}")
print(f"Edges recorded: {len(edge_set)}")

Database-Backed Waterfilling¶

For large-scale searches, the local implementation runs into two limitations: memory (all discovered points must fit in RAM) and speed (single-threaded). The database-backed implementation addresses both by storing state in a partitioned SQLite database and supporting parallel workers.

The key insight is that the CNF graph can be partitioned by hashing CNF coordinates. Each partition stores a subset of the points and can be processed independently, with coordination happening through a shared control plane.

Architecture¶

The distributed system consists of several components:

Partitions: Each partition is a SQLite database file containing a subset of CNF points and edges. A point belongs to partition hash(cnf) % num_partitions. This ensures that the same CNF always maps to the same partition.

CrystalMapStore: Within each partition, this store manages points (CNF coordinates, energies, explored status) and edges (neighbor relationships).

SearchProcessStore: Manages search-specific state within each partition: the frontier (points to explore), searched set (points already explored), and an "inbox" for receiving points from other partitions.

MetaStore: A separate control database that tracks global state: the current water level (minimum frontier energy across all partitions), search completion status, and aggregate statistics.

PartitionedDB: A Python class that provides a unified interface to all partitions and the meta store, routing operations to the appropriate partition based on CNF hash.

The Water Level¶

The central concept in distributed waterfilling is the "water level": the minimum energy among all frontier points across all partitions. Workers only explore points whose energy is at or below the current water level (plus a small tolerance called frontier_width).

This constraint ensures that the search proceeds in energy order globally, even though different workers are processing different partitions independently. As low-energy regions are exhausted, the water level naturally rises, and higher-energy frontier points become eligible for exploration.

Mailboxes¶

When a worker explores a point, some of its neighbors may belong to other partitions. Rather than writing directly to foreign partitions (which could cause lock contention), the worker drops these neighbors into the target partition's "inbox."

At the start of each step, workers process their inbox: they read incoming CNFs, add them to their local partition, compute their energies, and add them to the local frontier. This decoupled communication pattern allows workers to operate mostly independently.

Setting Up a Distributed Search¶

Setting up a distributed waterfill search requires creating the partitioned database with start and end points. The CLI provides commands for this:

# Create a partitioned database
cnf waterfill setup start.cif end.cif ./search_db \
    --num-partitions 16 \
    --xi 1.5 \
    --delta 100

This creates a directory search_db/ containing:

0.partition.db, 1.partition.db, ..., 15.partition.db — the partition files
meta.db — the control plane database
search_metadata.json — configuration metadata

Running Workers¶

Once the database is set up, workers can be launched to perform the search:

# Run with 4 parallel workers
cnf waterfill run ./search_db --num-workers 4

# Run with iteration limit
cnf waterfill run ./search_db --num-workers 4 --max-iters 10000

Each worker is assigned a subset of partitions. The run command spawns multiple processes, each calling continue_search_waterfill() with its assigned partition range.

Worker Algorithm¶

Each worker repeatedly executes waterfill_step(), which does the following:

Reload configuration: Read the current frontier_width from the metadata file (allows dynamic tuning without restarting workers).
Get water level: Query the meta store for the global minimum frontier energy.
Choose a partition: Select a random partition from the worker's assigned range.
Process inbox: Read incoming CNFs from the partition's inbox, add them to the local map store, compute their energies, and add them to the search frontier.
Process frontier: Select up to batch_size frontier points with energy ≤ water_level + frontier_width. For each:
- Find neighbors using the neighbor finder
- Partition neighbors by target partition
- Add local neighbors to local map store and frontier
- Send remote neighbors to target partition inboxes
- Mark the point as searched
Sync stats: Update the meta store with partition statistics.
Update water level: Report the partition's minimum frontier energy to the control plane.
Check completion: If a goal point has been found, mark the search as complete.

Monitoring Progress¶

The status command provides a dashboard showing search progress:

# One-time status check
cnf waterfill status ./search_db

# Live dashboard (updates every second)
cnf waterfill status ./search_db --watch

# Show per-partition breakdown
cnf waterfill status ./search_db --partitions

The dashboard shows:

Total points discovered and explored
Current water level and high water mark (maximum searched energy)
Frontier size and inbox sizes
Per-partition statistics if requested

Programmatic Usage¶

The distributed waterfill can also be used programmatically. Here's a sketch of the setup and execution flow:

In [ ]:

Copied!





# This cell demonstrates the API but doesn't run a full search
# (would require actual CIF files and energy calculator)

from cnf.db.setup_partitions import setup_search_dir
from cnf.db.partitioned_db import PartitionedDB
from cnf.navigation.waterfill import continue_search_waterfill

# The setup flow would be:
# 1. Load structures and convert to CNFs
# 2. Set up the partitioned database
#    search_id = setup_search_dir(
#        location="./search_db",
#        description="Example search",
#        num_partitions=16,
#        start_cnfs=start_cnfs,
#        end_cnfs=end_cnfs,
#        calculator=energy_calc
#    )
#
# 3. Run the search (typically in separate processes)
#    continue_search_waterfill(
#        search_id=search_id,
#        partitions_dir="./search_db",
#        calculator=energy_calc,
#        max_iters=10000,
#        batch_size=100,
#        partition_range=[0, 1, 2, 3]  # This worker handles partitions 0-3
#    )

print("See CLI examples above for actual usage")
# This cell demonstrates the API but doesn't run a full search
# (would require actual CIF files and energy calculator)

from cnf.db.setup_partitions import setup_search_dir
from cnf.db.partitioned_db import PartitionedDB
from cnf.navigation.waterfill import continue_search_waterfill

# The setup flow would be:
# 1. Load structures and convert to CNFs
# 2. Set up the partitioned database
#    search_id = setup_search_dir(
#        location="./search_db",
#        description="Example search",
#        num_partitions=16,
#        start_cnfs=start_cnfs,
#        end_cnfs=end_cnfs,
#        calculator=energy_calc
#    )
#
# 3. Run the search (typically in separate processes)
#    continue_search_waterfill(
#        search_id=search_id,
#        partitions_dir="./search_db",
#        calculator=energy_calc,
#        max_iters=10000,
#        batch_size=100,
#        partition_range=[0, 1, 2, 3]  # This worker handles partitions 0-3
#    )

print("See CLI examples above for actual usage")

Choosing Between Implementations¶

The local implementation is simpler and appropriate when:

The search space is small enough to fit in memory
You want quick exploration without setup overhead
Checkpointing to a single file is sufficient for recovery

The distributed implementation is better when:

The search space is large (millions of points)
You want to parallelize across multiple CPU cores or machines
You need persistent state that survives process crashes
You want to monitor progress in real-time

For most barrier-finding workflows, the A*-based iterative algorithms (search, sample, sweep, ratchet) are more efficient than waterfilling because they are goal-directed. Waterfilling is most useful for open-ended exploration where you want to map out the energy landscape systematically rather than find a specific path.

Summary¶

Waterfilling explores the CNF graph in energy order, expanding from low-energy regions to high-energy regions like water filling a basin. The local implementation provides a simple in-memory algorithm with optional checkpointing, dropout, and graph tracking. The distributed implementation uses a partitioned SQLite database to scale to large search spaces, with multiple workers processing different partitions and coordinating through a shared control plane.

The key concepts in distributed waterfilling are:

Partitioning by CNF hash ensures each point belongs to exactly one partition
Water level coordinates workers to explore in global energy order
Mailboxes decouple inter-partition communication to reduce lock contention
Meta store provides the control plane for water level and completion status