A* Search¶

A* is a pathfinding algorithm for navigating the CNF graph. Given a starting crystal and a goal crystal, A* finds a path through the neighbor graph connecting them (recall that each step corresponds to a small geometric perturbation). This guide explains how the algorithm works and how to use it.

The CNF Graph¶

Before diving into A*, it helps to recall the structure of the search space. Every crystal has a unique CNF representation—a tuple of integers describing lattice geometry and atomic positions. Two crystals are neighbors if their CNF tuples differ by small amounts: a ±1 change in one of the vonorm coordinates (lattice neighbors) or a ±1 change in atomic fractional coordinates (motif neighbors).

This defines an implicit graph where nodes are crystals and edges connect neighbors. A* searches this graph for a path between start and goal.

How A* Works¶

A* maintains two key data structures: an open set of nodes to explore, ordered by priority, and a closed set of nodes already visited. Each node tracks its g-score (cost from start) and f-score (g-score plus heuristic estimate to goal).

The algorithm proceeds as follows:

Initialize the open set with the start node(s), setting g = 0 and f = h(start, goal).
Pop the node with lowest f-score from the open set.
If this node is a goal, reconstruct and return the path.
Otherwise, generate all neighbors, compute their tentative g-scores, and add improved nodes to the open set.
Repeat until a goal is found or the open set is exhausted.

The heuristic function h(node, goals) estimates the remaining cost to reach any goal. When h is admissible (never overestimates true cost), A* finds an optimal path.

A Simple Example¶

Let's find a path between two similar structures. We'll use zirconium metal in two slightly different lattice configurations. At coarse resolution, these should be just a few steps apart.

In [1]:

Copied!





from pymatgen.core import Structure, Lattice
from cnf import CNFConstructor

# HCP zirconium (2 atoms per cell)
a, c = 3.23, 5.15
lattice_start = Lattice.hexagonal(a, c)
zr_start = Structure(
    lattice_start,
    ["Zr", "Zr"],
    [[0, 0, 0], [1/3, 2/3, 0.5]]
)

# Slightly strained version (5% larger a parameter)
lattice_goal = Lattice.hexagonal(a * 1.05, c)
zr_goal = Structure(
    lattice_goal,
    ["Zr", "Zr"],
    [[0, 0, 0], [1/3, 2/3, 0.5]]
)

print(f"Start volume: {zr_start.volume:.2f} Å³")
print(f"Goal volume:  {zr_goal.volume:.2f} Å³")
from pymatgen.core import Structure, Lattice
from cnf import CNFConstructor

# HCP zirconium (2 atoms per cell)
a, c = 3.23, 5.15
lattice_start = Lattice.hexagonal(a, c)
zr_start = Structure(
    lattice_start,
    ["Zr", "Zr"],
    [[0, 0, 0], [1/3, 2/3, 0.5]]
)

# Slightly strained version (5% larger a parameter)
lattice_goal = Lattice.hexagonal(a * 1.05, c)
zr_goal = Structure(
    lattice_goal,
    ["Zr", "Zr"],
    [[0, 0, 0], [1/3, 2/3, 0.5]]
)

print(f"Start volume: {zr_start.volume:.2f} Å³")
print(f"Goal volume:  {zr_goal.volume:.2f} Å³")

Start volume: 46.53 Å³
Goal volume:  51.30 Å³

In [2]:

Copied!





# Convert to CNF at coarse resolution
xi = 2.0
delta = 50

constructor = CNFConstructor(xi=xi, delta=delta)
start_cnf = constructor.from_pymatgen_structure(zr_start).cnf
goal_cnf = constructor.from_pymatgen_structure(zr_goal).cnf

print(f"Start CNF: {start_cnf.coords}")
print(f"Goal CNF:  {goal_cnf.coords}")
# Convert to CNF at coarse resolution
xi = 2.0
delta = 50

constructor = CNFConstructor(xi=xi, delta=delta)
start_cnf = constructor.from_pymatgen_structure(zr_start).cnf
goal_cnf = constructor.from_pymatgen_structure(zr_goal).cnf

print(f"Start CNF: {start_cnf.coords}")
print(f"Goal CNF:  {goal_cnf.coords}")

Start CNF: (5, 5, 13, 18, 5, 18, 18, 16, 33, 25)
Goal CNF:  (6, 6, 13, 19, 6, 19, 19, 16, 33, 25)

Running A* Search¶

The astar_pathfind function takes lists of start and goal CNFs. Multiple starts or goals are supported for cases where different supercell orientations are equivalent (more on this later).

In [3]:

Copied!





from cnf.navigation.astar.core import astar_pathfind
from cnf.navigation.astar.heuristics import manhattan_distance

search_state = astar_pathfind(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    heuristic=manhattan_distance,
    max_iterations=10000,
    verbose=True,
    speak_freq=50,
)

print(f"\nPath found: {search_state.found_goal}")
print(f"Path length: {len(search_state.path) if search_state.path else 'N/A'} steps")
print(f"Iterations: {search_state.iterations}")
from cnf.navigation.astar.core import astar_pathfind
from cnf.navigation.astar.heuristics import manhattan_distance

search_state = astar_pathfind(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    heuristic=manhattan_distance,
    max_iterations=10000,
    verbose=True,
    speak_freq=50,
)

print(f"\nPath found: {search_state.found_goal}")
print(f"Path length: {len(search_state.path) if search_state.path else 'N/A'} steps")
print(f"Iterations: {search_state.iterations}")

[tensorpotential] Info: Environment variable TF_USE_LEGACY_KERAS is automatically set to '1'.

      Starting A* search with 1 start states and 1 goal states

      Found goal after 4 iterations!

Path found: True
Path length: 4 steps
Iterations: 4

The search state contains the complete path as a list of CNF tuples. Each tuple represents an intermediate crystal structure along the transformation pathway.

In [4]:

Copied!





if search_state.path:
    print("Path (CNF tuples):")
    for i, coords in enumerate(search_state.path):
        print(f"  {i}: {coords}")
if search_state.path:
    print("Path (CNF tuples):")
    for i, coords in enumerate(search_state.path):
        print(f"  {i}: {coords}")

Path (CNF tuples):
  0: (5, 5, 13, 18, 5, 18, 18, 16, 33, 25)
  1: (5, 5, 13, 19, 6, 18, 18, 16, 33, 25)
  2: (5, 6, 13, 19, 6, 18, 19, 16, 33, 25)
  3: (6, 6, 13, 19, 6, 19, 19, 16, 33, 25)

Heuristics¶

The heuristic function guides the search by estimating how far each node is from the goal. A good heuristic can dramatically reduce the number of nodes explored.

The default heuristic is Manhattan distance: the sum of absolute differences between coordinates. For CNF, this tends to overestimate the true graph distance because the canonical ordering of coordinates can change when crossing certain lattice boundaries. Still, it's fast and usually effective.

More sophisticated heuristics account for these coordinate reorderings by precomputing all equivalent representations of the goal. The UnimodularManhattanHeuristic does this, giving tighter estimates at the cost of preprocessing time.

In [5]:

Copied!





from cnf.navigation.astar.heuristics import squared_euclidean_heuristic

# Try with Euclidean heuristic
search_state2 = astar_pathfind(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    heuristic=squared_euclidean_heuristic,
    max_iterations=10000,
    verbose=False,
)

print(f"Squared Euclidean heuristic:")
print(f"  Path found: {search_state2.found_goal}")
print(f"  Path length: {len(search_state2.path) if search_state2.path else 'N/A'} steps")
print(f"  Iterations: {search_state2.iterations}")
from cnf.navigation.astar.heuristics import squared_euclidean_heuristic

# Try with Euclidean heuristic
search_state2 = astar_pathfind(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    heuristic=squared_euclidean_heuristic,
    max_iterations=10000,
    verbose=False,
)

print(f"Squared Euclidean heuristic:")
print(f"  Path found: {search_state2.found_goal}")
print(f"  Path length: {len(search_state2.path) if search_state2.path else 'N/A'} steps")
print(f"  Iterations: {search_state2.iterations}")

Squared Euclidean heuristic:
  Path found: True
  Path length: 4 steps
  Iterations: 4

Beam Search¶

For large search spaces, standard A* can exhaust memory by maintaining too many nodes in the open set. Beam search addresses this by limiting the open set to the best beam_width nodes at each step.

This sacrifices optimality (the shortest path might be pruned) but makes search tractable for complex problems. In practice, beam widths of 500–2000 work well for most crystal structures.

In [6]:

Copied!





# Beam search with limited open set
search_state_beam = astar_pathfind(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    heuristic=manhattan_distance,
    max_iterations=10000,
    beam_width=100,  # Keep only top 100 nodes
    verbose=False,
)

print(f"Beam search (width=100):")
print(f"  Path found: {search_state_beam.found_goal}")
print(f"  Path length: {len(search_state_beam.path) if search_state_beam.path else 'N/A'} steps")
print(f"  Iterations: {search_state_beam.iterations}")
# Beam search with limited open set
search_state_beam = astar_pathfind(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    heuristic=manhattan_distance,
    max_iterations=10000,
    beam_width=100,  # Keep only top 100 nodes
    verbose=False,
)

print(f"Beam search (width=100):")
print(f"  Path found: {search_state_beam.found_goal}")
print(f"  Path length: {len(search_state_beam.path) if search_state_beam.path else 'N/A'} steps")
print(f"  Iterations: {search_state_beam.iterations}")

Beam search (width=100):
  Path found: True
  Path length: 4 steps
  Iterations: 4

Dropout¶

Another technique for managing search complexity is neighbor dropout. At each step, some fraction of valid neighbors are randomly excluded from consideration. This introduces stochasticity, making the algorithm non-deterministic, but can help escape local regions of the search space where many similar nodes compete for attention.

Dropout is useful when you want diverse paths rather than a single optimal one. The dropped neighbors are permanently excluded—if we later reach the same node from a different direction, we still won't explore neighbors that were dropped earlier.

In [7]:

Copied!





# Search with dropout
search_state_dropout = astar_pathfind(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    heuristic=manhattan_distance,
    max_iterations=10000,
    dropout=0.1,  # Drop 10% of neighbors randomly
    verbose=False,
)

print(f"Search with 10% dropout:")
print(f"  Path found: {search_state_dropout.found_goal}")
print(f"  Path length: {len(search_state_dropout.path) if search_state_dropout.path else 'N/A'} steps")
print(f"  Iterations: {search_state_dropout.iterations}")
# Search with dropout
search_state_dropout = astar_pathfind(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    heuristic=manhattan_distance,
    max_iterations=10000,
    dropout=0.1,  # Drop 10% of neighbors randomly
    verbose=False,
)

print(f"Search with 10% dropout:")
print(f"  Path found: {search_state_dropout.found_goal}")
print(f"  Path length: {len(search_state_dropout.path) if search_state_dropout.path else 'N/A'} steps")
print(f"  Iterations: {search_state_dropout.iterations}")

Search with 10% dropout:
  Path found: True
  Path length: 4 steps
  Iterations: 4

Filters¶

Filters restrict which neighbors are considered during search. This is essential for avoiding unphysical crystal configurations—structures where atoms overlap or energies are unreasonably high.

The most common filter is MinDistanceFilter, which rejects any structure where two atoms are closer than a specified threshold. This prevents the search from exploring collapsed configurations.

In [8]:

Copied!





from cnf.navigation.search_filters import FilterSet, MinDistanceFilter

# Create a filter that rejects structures with atoms closer than 2 Å
min_dist_filter = MinDistanceFilter(min_dist=2.0)
filter_set = FilterSet([min_dist_filter])

search_state_filtered = astar_pathfind(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    heuristic=manhattan_distance,
    filter_set=filter_set,
    max_iterations=10000,
    verbose=False,
)

print(f"Search with min_distance filter (2.0 Å):")
print(f"  Path found: {search_state_filtered.found_goal}")
print(f"  Path length: {len(search_state_filtered.path) if search_state_filtered.path else 'N/A'} steps")
print(f"  Iterations: {search_state_filtered.iterations}")
from cnf.navigation.search_filters import FilterSet, MinDistanceFilter

# Create a filter that rejects structures with atoms closer than 2 Å
min_dist_filter = MinDistanceFilter(min_dist=2.0)
filter_set = FilterSet([min_dist_filter])

search_state_filtered = astar_pathfind(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    heuristic=manhattan_distance,
    filter_set=filter_set,
    max_iterations=10000,
    verbose=False,
)

print(f"Search with min_distance filter (2.0 Å):")
print(f"  Path found: {search_state_filtered.found_goal}")
print(f"  Path length: {len(search_state_filtered.path) if search_state_filtered.path else 'N/A'} steps")
print(f"  Iterations: {search_state_filtered.iterations}")

Search with min_distance filter (2.0 Å):
  Path found: True
  Path length: 4 steps
  Iterations: 4

The Rust Implementation¶

For performance-critical applications, CNF provides a Rust implementation of A* that runs significantly faster than the Python version. The Rust implementation supports the most common use case: A* with beam search, dropout, and a minimum distance filter.

The astar_rust function provides a simple interface. It returns a tuple of (path_tuples, iterations), where path_tuples is None if no path was found.

In [9]:

Copied!





from cnf.navigation.astar import astar_rust

path_tuples, iterations = astar_rust(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    min_distance=2.0,
    max_iterations=10000,
    beam_width=1000,
    dropout=0.0,
    verbose=False,
)

print(f"Rust A* search:")
print(f"  Path found: {path_tuples is not None}")
print(f"  Path length: {len(path_tuples) if path_tuples else 'N/A'} steps")
print(f"  Iterations: {iterations}")
from cnf.navigation.astar import astar_rust

path_tuples, iterations = astar_rust(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    min_distance=2.0,
    max_iterations=10000,
    beam_width=1000,
    dropout=0.0,
    verbose=False,
)

print(f"Rust A* search:")
print(f"  Path found: {path_tuples is not None}")
print(f"  Path length: {len(path_tuples) if path_tuples else 'N/A'} steps")
print(f"  Iterations: {iterations}")

Rust A* search:

  Path found: True
  Path length: 4 steps
  Iterations: 4

Greedy Search¶

Standard A* uses f = g + h, balancing path cost and heuristic estimate. Greedy search uses f = h only, always moving toward the seemingly closest goal. This is faster but may find longer paths or get stuck in local minima.

Greedy search is useful for quick exploration or when you're more interested in finding any path than the shortest one.

In [10]:

Copied!





# Greedy search
search_state_greedy = astar_pathfind(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    heuristic=manhattan_distance,
    max_iterations=10000,
    greedy=True,
    verbose=False,
)

print(f"Greedy search:")
print(f"  Path found: {search_state_greedy.found_goal}")
print(f"  Path length: {len(search_state_greedy.path) if search_state_greedy.path else 'N/A'} steps")
print(f"  Iterations: {search_state_greedy.iterations}")
# Greedy search
search_state_greedy = astar_pathfind(
    start_cnfs=[start_cnf],
    goal_cnfs=[goal_cnf],
    heuristic=manhattan_distance,
    max_iterations=10000,
    greedy=True,
    verbose=False,
)

print(f"Greedy search:")
print(f"  Path found: {search_state_greedy.found_goal}")
print(f"  Path length: {len(search_state_greedy.path) if search_state_greedy.path else 'N/A'} steps")
print(f"  Iterations: {search_state_greedy.iterations}")

Greedy search:
  Path found: True
  Path length: 4 steps
  Iterations: 4

Understanding the Search State¶

The AStarSearchState object returned by astar_pathfind contains rich information about the search process. Beyond the path itself, it includes the open and closed sets, the came_from map for path reconstruction, and various g-scores. This can be useful for debugging or for warm-starting subsequent searches.

In [11]:

Copied!





print(f"Search state attributes:")
print(f"  found_goal: {search_state.found_goal}")
print(f"  max_iterations_reached: {search_state.max_iterations_reached}")
print(f"  iterations: {search_state.iterations}")
print(f"  open_set size: {len(search_state.open_set)}")
print(f"  closed_set size: {len(search_state.closed_set)}")
print(f"  came_from entries: {len(search_state.came_from)}")
print(f"Search state attributes:")
print(f"  found_goal: {search_state.found_goal}")
print(f"  max_iterations_reached: {search_state.max_iterations_reached}")
print(f"  iterations: {search_state.iterations}")
print(f"  open_set size: {len(search_state.open_set)}")
print(f"  closed_set size: {len(search_state.closed_set)}")
print(f"  came_from entries: {len(search_state.came_from)}")

Search state attributes:
  found_goal: True
  max_iterations_reached: False
  iterations: 4
  open_set size: 80
  closed_set size: 3
  came_from entries: 83

Summary¶

A* search navigates the implicit graph of crystal structures to find transformation pathways between phases. Key parameters include the heuristic function (guiding search priority), beam width (limiting memory usage), dropout (adding stochasticity), and filters (excluding unphysical configurations).

The other pathfinding guides in this documentation build on A* with additional logic for finding optimal discretization parameters and minimizing energy barriers.