Fast, scalable, and interactive software for Landau-de Gennes numerical modeling of nematic topological defects

Numerical modeling of nematic liquid crystals using the tensorial Landau-de Gennes (LdG) theory provides detailed insights into the structure and energetics of the enormous variety of possible topological defect configurations that may arise when the liquid crystal is in contact with colloidal inclusions or structured boundaries. However, these methods can be computationally expensive, making it challenging to predict (meta)stable configurations involving several colloidal particles, and they are often restricted to system sizes well below the experimental scale. Here we present an open-source software package that exploits the embarrassingly parallel structure of the lattice discretization of the LdG approach. Our implementation, combining CUDA/C++ and OpenMPI, allows users to accelerate simulations using both CPU and GPU resources in either single- or multiple-core configurations. We make use of an efficient minimization algorithm, the Fast Inertial Relaxation Engine (FIRE) method, that is well-suited to large-scale parallelization, requiring little additional memory or computational cost while offering performance competitive with other commonly used methods. In multi-core operation we are able to scale simulations up to supra-micron length scales of experimental relevance, and in single-core operation the simulation package includes a user-friendly GUI environment for rapid prototyping of interfacial features and the multifarious defect states they can promote. To demonstrate this software package, we examine in detail the competition between curvilinear disclinations and point-like hedgehog defects as size scale, material properties, and geometric features are varied. We also study the effects of an interface patterned with an array of topological point-defects.


INTRODUCTION
Nematic liquid crystals' combination of fluidity and orientational order both underlies nematics' widespread technological applications and endows them with topological defects, localized breakdowns in the orientational order stabilized by the medium's broken symmetries. The topological defects of nematics have been integral to the study of liquid crystals since the field's infancy [1].
Besides their role as tabletop physical realizations of profound topological ideas, nematic topological defects-including disclination lines, point-like hedgehogs, and surface-bound boojums-are of great interest for their importance in nematic colloidal suspensions [2]. These composite materials, formed by suspensions of colloidal particles or nanoparticles in nematics, promise new routes to directed self-assembly and selforganization. Nanoparticles in nematics are pushed by elastic forces to assemble in pre-existing defect lines, meaning that sculpted disclinations provide a path to controlled nanoparticle assembly. Applications include plasmonic properties for metamaterials [3,4], molecular self-assembly [5], and quantumdot assembly in microshells [6,7]. Even greater complexity arises in the cases of colloidal particles in the size range of tens of nanometers to several microns, which often have companion topological defects and which interact through forces mediated by nematic elasticity. Self-assembled structures of colloidal particles with companion defects include bound pairs, chains [2,8] and triclinic 3D crystals [9]; with the aid of laser tweezers, other configurations such as 3D crystals with tetragonal symmetry [10] and sophisticated disclination knots [11][12][13][14] can be stabilized. Tailored self-assembled colloidal structures hold promise as optical metamaterials for photonics applications, such as photonic bandgap crystals and microlasers [15][16][17][18].
Nematic defect configurations can also be controlled by nontrivial boundary surfaces [19]. Substrate patterning strategies include topographic variations such as "lock-and-key" docking sites for colloidal particles [20][21][22] and chemical patterning where the boundary condition shifts abruptly [23,24]. Complex director fields, including disclinations, can be prescribed on a substrate by photoalignment [25] or by scribing with an atomic force microscope [26]. Confinement in geometries such as capillaries [27], droplets [28], shells [29], and thin films [30] produces a wealth of point-and line-defect behaviors stabilized by topology and energetics.
The rapidly expanding variety of experimentally created nematic defect configurations has benefited greatly from the understanding provided by robust modeling approaches. One set of approaches is based on the Frank-Oseen elastic free energy, which penalizes deformations of the nematic directorn(x), and which in its simplest form reads The superscript (1) refers to the approximation of a single elastic constant K in this expression. However, then = −n symmetry of nematics presents challenges for this model in the presence of disclinations with half-integer winding number, especially if their locations are not known beforehand. This difficulty is resolved by the Landau-de Gennes (LdG) model, the theoretical approach which is the focus of this work and which we review in section 2. The LdG framework takes as its order parameter the second-rank traceless nematic order tensor Q ij (x), and is well-suited to modeling arbitrary disclination configurations, as well as biaxial nematics and the blue phases [15,31]. While little is known analytically about free energy minimizers in LdG theory in any but the simplest geometries [32,33], numerical minimization of the LdG free energy has been fruitfully applied over a wide range of systems [10-12, 15, 21, 34-48]. Additionally, flow dynamics of nematics, including active nematic systems, can be modeled by supplementing the LdG free energy with hydrodynamical equations as formulated by Beris and Edwards [49] or by Qian and Sheng [50] and solved by methods such as lattice Boltzmann [51][52][53][54][55], multiparticle collision dynamics and related off-lattice methods [56][57][58], or finite difference and finite element approaches [59,60]. Some methods incorporate a fast relaxation of the momentum compared to the director, to account for the separation in time scales for these relaxations in typical molecular liquid crystals [52,61].
The broad usefulness of the LdG theory goes hand in hand with a significant limitation of scale: Resolving defects at a priori unknown locations requires the simulation lattice spacing to be comparable to or smaller than the size of the defect core, the region in which nematic order breaks down, which in thermotropic nematics is typically a few nanometers. This is often thousands of times smaller than the individual micron-scale colloidal particles of interest. Therefore, a faithful rescaling of the experimental system in numerics would require of order at least 10 9 lattice sites even for configurations involving only a small number of such colloids.
Accessing such experimentally relevant lattice sizes presents computational challenges not often seen in the simulations of glassy and polymeric soft matter systems. The demands on system memory quickly become prohibitive: simply maintaining the five independent degrees of freedom at each lattice site and storing the necessary change in those variables from one minimization step to the next at 10 9 lattice sites requires 80 GB-more than on most current commodity desktops and larger than the memory capacity of any CUDA-capable GPU 1 . Additionally, there is a large direct computational cost of even simple manipulations acting on so many degrees of freedom; this contributes to the significant wall-time required for most numerical energy minimizations and presents challenges for efficient exploration of parameter spaces and colloidal particle positions.
Consequently, LdG numerical modeling is typically applied to systems significantly scaled down, with respect to a fixed defect core size, as compared with the experiments that they are intended to model. While important qualitative insights about defects and director fields can often be obtained by scaling down the experimental dimensions, the change in size ratios makes quantitative prediction challenging. There can also be major qualitative differences. The most well-known of these is the form of the companion defect to a particle with homeotropic (normal) anchoring: Micron-scale particles typically have hyperbolic hedgehog companions (in the absence of confinement or external fields) [2], whereas particles in the few hundred nm or smaller size range have disclination loops in the "Saturn ring" configuration [62,63]. This constitutes a major challenge in modeling systems with multiple colloidhedgehog pairs. Experimental work on high-aspect ratio colloidal particle shapes observes both hedgehogs and disclination loops, but numerical modeling has been limited to the line defect case [35, 38-43, 64, 65]. Adaptive mesh refinement in finite-element simulations can help to avoid computational and memory expense in regions not near defects [66] but typically does not remove the need to scale down.
In this work we present an open-source finite differencebased implementation of LdG free energy minimization with non-trivial boundary conditions, "openQmin" [67], using a combination of approaches designed to address the challenges described above. It is written for heterogeneous CPU and GPU operation to target two complementary research goals. First, it offers a user-friendly GUI environment for rapid prototyping of topological defect configurations as a function of liquid crystal parameters, boundary geometry, and the presence of colloidal inclusions. Simultaneously, it targets large-scale systems using OpenMPI [68] to support parallelization across both CPU and GPU resources to scale up to the supra-micron length scales of experimental relevance. We employ efficient minimization algorithms, such as the Fast Inertial Relaxation Engine (FIRE) method, to maintain reasonable convergence times even for large-scale parallelized calculations.
The remainder of the paper is structured as follows. We begin with a review of the LdG theory in section 2. Section 3 lays out our numerical approach, first discretizing the LdG theory for a finite-difference method, and then outlining our use of minimization algorithms and OpenMPI parallelization. In section 4 we present two sample studies demonstrating the effectiveness of this approach. We first perform a classic analysis of the companion defects to homeotropic spherical particles at varying system sizes, and then examine the effects of a boundary patterned with surface disclinations in a supra-micron-scale system. Section 5 briefly describes the GUI version of openQmin with an example of the rapid prototyping workflow it enables. Finally, in section 6, we discuss both the range of use we foresee for openQmin and some future directions for additional physics that could be studied in this framework.

LANDAU DE-GENNES THEORY FOR NEMATIC LIQUID CRYSTALS
Here we give a brief overview of those aspects of the LdG theory used in our numerical approach. The theory is of course well-established [69] and its use in a finite difference numerical free energy minimization scheme is described in several sources; the reader is directed to Ravnik and Žumer [47] for a thorough explanation.
Uniaxial nematic liquid crystals are characterized by orientational ordering of nematogens (molecules or suspended anisotropic particles) along a director,n, which is a unit vector with the identificationn = -n. To respect that symmetry consistently, which is important at disclinations of half-integer winding number, we take as order parameter not a director but a second-rank tensor. This is the traceless, symmetric tensor field Q(x), whose lattice discretization is the fundamental object of the LdG modeling approach. Q is related ton by [70] Here, S is the degree of uniaxial nematic order, and S B is the degree of biaxial order distinguishing a preferred directionm ≡ −m, perpendicular ton, froml ≡n ×m. The nematic director can be recovered as the eigenvector corresponding to the largest eigenvalue of Q, which equals S. Most nematics are unaxial, so the equality S B = 0 is true in the absence of distortions and represents a good approximation away from defects. In this uniaxial limit, Equation (2) reduces to

Phenomenological Free Energy Density
The LdG theory constructs a phenomenological free energy F as a functional of Q(x). We can write this functional schematically as [47,70]: (4) The first integral, over the volume of the nematic, has three free energy density terms incorporating respectively the energetic costs arising from deviations away from the thermodynamically preferred degree of nematic order S = S 0 , from elastic distortions, and from external fields. The second integral, summing over all boundary surfaces S α in contact with the nematic, incorporates the anchoring energy associated with each interface, including the surfaces of colloidal particles; its form may be different for different surfaces. We address each component in turn.

Bulk Free Energy
The first free energy density term in Equation (4) gives a Landau free energy for the isotropic-nematic phase transition, written in terms of rotational invariants of Q in a Taylor expansion about the isotropic, Q = 0 state [71]: The parameter A ∝ (T − T * NI ), where T * NI is the temperature at which the isotropic phase is destabilized. In the uniaxial limit f bulk becomes a polynomial in the degree of order, which is minimized either by S = 0 or by In the nematic phase, the absolute value of provides a free energy penalty per unit volume to the melted cores of defects, where S → 0.

Distortion Free Energy
The distortion free energy density models the elasticity of the nematic phase, and represents the LdG counterpart to the Frank-Oseen free energy density. The latter, in full generality up to second derivatives ofn, is The parameters in this expression are the splay (K 1 ), twist (K 2 ), bend (K 3 ), and saddle-splay (K 24 ) elastic constants, and the spontaneous chiral wavenumber q 0 which is non-zero in the cholesteric and blue phases. Other common conventions for the saddle-splay energy density replace K 24 in Equation (9) by either 2K 24 or 2(K 2 + K 24 ). Equation (9) reduces to Equation (1) under the "one-constant approximation" K 1 = K 2 = K 3 = K 24 ≡ K and q 0 = 0. The one-constant approximation is a reasonable simplification for many molecular liquid crystals, where K 1 , K 2 , and K 3 typically differ by less than a factor of 5 [72,73]. The most general form of f distortion that we employ, following Poniewierski and Sluckin [74], Mori et al. [48], and Mottram and Newton [70], includes all gradient terms of quadratic order in Q allowed by symmetry, plus one term at cubic order: where Einstein summation over repeated indices is implied, and ǫ is the Levi-Civita tensor. Equation (10) corresponds in the uniaxial limit to Equation (9) with the identifications [48] The one-constant approximation in the absence of spontaneous chiral ordering sets L 2 = L 3 = L 4 = L 6 = 0, leaving the much simpler and more computationally efficient form which corresponds in the uniaxial limit to Equation (1) with Taking this simpler form of the distortion energy density, we estimate the defect core size by considering a distorted uniaxial nematic configuration at S = S 0 withn varying with typical gradient 1/ℓ. Roughly speaking, the energy well depth f 0 (Equation 8) gives the threshold value for f distortion at which distortions become so energetically costly that a local melting of nematic order occurs instead. This length ℓ = ξ N , the nematic correlation length (or coherence length), sets the size of the defect core:

External Fields Free Energy
The response of the nematic to an external magnetic field H or an external electric field E is modeled by the free energy density term where χ and ε are the anisotropic parts (difference in principal values corresponding ton and its perpendicular directions) of the magnetic susceptibility tensor and dielectric tensor, respectively [34], and µ 0 and ε 0 are respectively the magnetic permeability and electric permittivity of free space. (We omit here the terms for the isotropic parts of these tensors, as they do not couple to Q). In the uniaxial limit, the right-hand side becomes − 1 2 Sµ 0 χ(H ·n) 2 − 1 2 Sε 0 ε(E ·n) 2 (again dropping isotropic terms with no coupling ton). Positive χ or ε will favor alignment ofn with H or E.

Boundary Free Energy
Boundary surfaces, including the surfaces of embedded colloidal particles, generally impose an anchoring surface energy density f boundary representing the surface tension's dependence on the director at the surface. In terms of the director, a common modeling choice for the anchoring energy is the Rapini-Papoular form − 1 2 W α RP (ν α · n) 2 whereν α is the surface normal vector and |W α | is the anchoring strength of surface α [75]. Homeotropic (normal) anchoring follows from W RP > 0, whereas W RP < 0 creates degenerate planar anchoring, which equally favors every direction perpendicular toν α . The same anchoring functional can favor a different anchoring direction, for example an inplane direction in the case of oriented planar anchoring, using W RP > 0 with the replacement ofν α by the favored direction.
In LdG theory, for homeotropic or other oriented anchoring, the Rapini-Papoular form is generalized as the Nobili-Durand surface anchoring form [76], where W α ND > 0 is the anchoring strength of surface α and the surface-preferred Q-tensor, Q α , is usually taken to be Q α ij = 3 2 S 0 (n α i n α j − 1 3 δ ij ), withn α =ν α or some other surfacepreferred director.
For degenerate planar anchoring, the Nobili-Durand form is not suitable, and we use instead the following free energy due to Fournier and Galatola [77]:

Overview
The primary contribution of this work is the presentation of an open-source numerical implementation that exploits the embarrassingly parallel structure of the lattice discretization of the above phenomenological theory. Our implementation, combining CUDA/C++ [78] and OpenMPI [68], was written with extreme flexibility in mind to allow users to accelerate simulations large and small using combinations of available CPU and GPU resources in either single-or multiplecore configurations. The foundation of the software package, "dDimensionalSimulation, " is a set of generic classes meant to execute simulations of N interacting units, each consisting of d scalar degrees of freedom, using data structures appropriate for efficient execution on either CPU or GPU resources. These generic classes serve as the template for models which instantiate the dN total degrees of freedom, forces which compute interactions between degrees of freedom, updaters which can change the degrees of freedom (e.g., by implementing equations of motion), and simulations which tie objects of these various types together. The present work focuses on implementing the details of these classes to carry out lattice-based LdG modeling to find energy-minimized configurations of equilibrium nematics in the presence of various boundary conditions. The general structure we have employed was chosen to allow flexibility in future development, for example to derive new model classes which include not only the Q-tensor but also density and velocity degrees of freedom, as would be appropriate for modeling active nematic systems [53][54][55]60].
In addition to writing efficient code to carry out the required lattice-based minimizations of the Q-tensor field in a domain, we also advocate the use of the graphical user interface (GUI) we developed to rapidly prototype and explore the effects of particular boundaries, colloidal inclusions, and external fields that may be of experimental interest. The GUI allows a wide variety of user operations-adding boundary objects at any stage of the simulation, starting and stopping minimization, adding or removing external fields at will-all while visualizing the resulting defect structure and recording configurational details. A snapshot of the GUI is shown in Figure 1, and more details of the available features are given in section 5. We envision that this capability will allow for rapid prototyping of experimental geometries in the search for particular controllable defect states; running on a single GPU allows real-time visualization of lattices in the low-millions of total sites. We have also included several example files that use the code in a non-GUI mode; these can then use OpenMPI to parallelize across either CPU or GPU resources to scale up to lattices that represent supra-micron-scale liquid crystal systems.

Lattice Discretization and Energy Minimization
The finite difference lattice calculations employed in this work use a regular cubic lattice discretization of space, with a Qtensor defined at each site x = {x, y, z}. The lattice spacing x can be related to physical quantities through a natural nondimensionalization of the free energy density,f ≡ f /|A|, which implies a non-dimensionalization of the elastic constantsL i ≡ L i /(|A| x 2 ). In the one-constant approximation, we thus have In this work we setL 1 = 2.32. To model 5CB, following Ravnik and Žumer [47] we take A = −0.172 × 10 6 J/m 3 , B = −2.12 × 10 6 J/m 3 , C = 1.73 × 10 6 J/m 3 , and These give a lattice spacing of x ≈ 4.5 nm, which is at the few-nm scale of the defect core in 5CB. Note that the non-dimensionalization of all constants by an energy scale |A| and a length x is implicitly made for all values in openQmin, including in the GUI.
The symmetry and tracelessness of Q leaves five independent degrees of freedom, which we take to be q ≡ {Q xx , Q xy , Q xz , Q yy , Q yz } at each of the N lattice sites in the simulation domain. We write the local free energy density f ( x) in terms of these five independent variables, so that the symmetry and tracelessness of Q are automatic (rather than being maintained by projection operations following update steps [47]). We also label each site with an integer "type, " indicating whether it is a bulk site, a boundary site, or a site inside an object (for instance, the interior of a colloidal inclusion, or part of a bounding surface), depending on the geometry of the problem. Only bulk and boundary sites are "simulated sites, " meaning Q is defined there.
We discretize the total free energy, F = N i=1 f ( x i ), using a finite-difference approach over the 5N independent variables. For the distortion terms we allow the user to select either the more general expression, Equation (10), or the more computationally efficient expression of Equation (11). For the terms in f distortion which contain spatial first derivatives of Q, we consider firstorder forward and backward finite difference approximations, Herex k is the unit vector in the x k direction, and x is the site where the calculation is taking place. The choice of a regular cubic lattice makes these derivative approximations straightforward to calculate. The forward and backward finite differences are each compatible with the simulation domain only if ( x +x k ), ( x −x k ), respectively, are simulated (bulk or boundary) sites. We then take, as the discretized expression of f , Equation (4) averaged over all forward and backward finite difference expressions for each of k = 1, 2, 3 that are allowed by the geometry of the simulation domain. A bulk site, therefore, has a local free energy averaged over 2 3 such combinations, while a boundary site has fewer. We use these averages over different expressions for the finite differences, rather than using a single centered finite difference , because using the latter form in Equation (11) produces no terms coupling Q ij ( x) to its nearest neighbors, of the form Q ij ( x)Q ij ( x ±x k ). This use of the centered first derivative expression would therefore create an artificial (and undesirable) lattice doubling effect in our approach, with even sites and odd sites evolving independently. For curved boundaries such as on spherical colloidal particles, well-known inaccuracies are introduced in the finite difference calculations by the discretization of boundaries as sites in the cubic lattice [79]. Specifically, errors of order O( x) in Q ij ( x) are introduced, leading to truncation errors of O(1) (which do not diverge as the lattice spacing is refined) in the first derivative approximations of Equation (16).
Finally, we minimize F as a cost function over the 5N independent variables q i ( x j ), i = 1, . . . , 5, j = 1, . . . , N. The gradient of F in this 5N-dimensional space is calculated by explicitly differentiating the expression for F with respect to each q i ( x j ) degree of freedom. This explicit differentiation of a cost function is an alternative to the approach of analytically deriving local forces (molecular fields) from the Euler-Lagrange equations, projecting to recover symmetry and tracelessness, and then discretizing those expressions. While the Euler-Lagrange equations have separate forms for the bulk and the boundaries, in the approach used here forces are derived from the cost function in formally the same way for bulk and boundary sites.
We emphasize that by discretizing space, we can directly map the problem of solving the LdG partial differential equations to finding the minima of a complex energy landscape (where the Q-tensors on each lattice site are the degrees of freedom). For instance, many PDE solvers implement steepest descent relaxation, which can be directly interpreted as overdamped molecular dynamics at zero temperature. This allows us to turn to the wealth of existing algorithmic approaches from the field of non-linear optimization, including minimization techniques such as quasi-Newton methods (conjugate), gradient descent, and momentum-based techniques such as Nesterov's accelerated gradient [80]. Since our aim is to be able to scale up to large systems, we ignore minimizers which require secondorder derivatives of the cost function, and we find that even limited-memory quasi-Newton methods such as L-BFGS impose too-strong a memory requirement for many of our purposes. Additionally, while conjugate gradient is appealing in having only marginal extra memory requirements and being much faster than simple gradient descent, it involves frequent line searches that require expensive repeated evaluations of the free energy density and imposes additional parallelization costs.
Thus, although we have implemented many of the abovenamed minimizers in openQmin, we focus our attention on the use of the Fast Inertial Relaxation Engine (FIRE) method of energy minimization [81]. FIRE falls into the class of "gradient plus momentum"-style minimization algorithms, and it additionally rescales the "velocity" (fictitious additional variables introduced to make the analogy with molecular dynamics even more complete and corresponding to the velocities at which the Q-tensor components change) of the degrees of freedom and adaptively changes the size of the time step itself based on the behavior of the force and velocity during the most recent update. For convex optimization problems the addition of inertia can be proven to enhance convergence [82], although for more complex energy landscapes Algorithm 1: Pseudocode for FIRE minimization [81].
Initialize Q-tensors at each lattice site, set velocities v i to zero; while Minimization criteria not satisfied do Update q i ( x j ), force = −∇F, and v i using a velocity Verlet step; Calculate power, P, as the dot product of the force and velocity vectors; Rescale velocity by a parameter α which sets the inertia of the degrees of freedom; if P > 0 then if P has been positive for more steps than a threshold, N min then Increase the time step size and increase α. end else Decrease the time step size, reset velocities to zero, reset α to initial value; end end in general little can be proven. Thus, while it is a heuristic approach, FIRE has been shown to be competitive with (or even faster than) conjugate gradient minimization [81,83,84], all while maintaining an extremely light additional memory footprint and being highly amenable to parallelization across multiple cores or multiple GPU units. Note that FIRE was originally developed with atomistic simulations in mind, but it is increasingly being used more generally, including in the solution of PDEs [85] and in machine learning applications [86]. By the straightforward mapping mentioned above we are able to directly apply the pseudocode presented in Algorithm 1.
We first demonstrate this efficient minimization in Figure 2, where we compare the system energy and average norm of the force on the degrees of freedom during the minimization of a lattice of N = 250 3 sites in a cubic geometry with periodic boundary conditions. To make a fair comparison, we have performed both a FIRE and a gradient descent (GD) minimization on the same system using separately tuned minimization parameters for each algorithm. We use the same hardware for each simulation, and report the minimization progress in terms of the wall-clock time taken. Although it is sometimes common to report efficiency in such comparisons in units of function calls, for algorithms with very different numbers of arithmetic operations (each FIRE iteration requires more than twice the number of arithmetic operations compared to GD) such comparisons are often misleading.
As Figure 2 makes clear, even in the trivial case of finding the uniform nematic ground state for a system with no boundary terms from a system initialized with random Qtensors at each lattice site, FIRE provides orders of magnitude improvement in the time taken to find minima. This performance of our default minimizer is not restricted to simple, bulk states of the liquid crystals. As we demonstrate in   N = 250 3 ), starting from a randomly initialized configuration, as a function of wall-clock time. Solid lines are minimizations using FIRE and dashed lines are those using gradient descent. As described in the text, we have tuned the minimization parameters (step size, etc.) for each algorithm separately and use identical hardware to make a one-to-one comparison.
for a handful of simple (and well-studied) arrangements of colloidal inclusions and boundaries, FIRE is very rapidly able to find these more complex minima, too. As with any nonconvex optimization solver, though, no guarantees are made by FIRE about avoiding particular local minima in favor of a true global minimum. Where this is a concern, we adopt the standard approach of minimizing from multiple different random initializations. Particularly when coupled with a GPU, the substantial acceleration of FIRE-based minimizations enables the usefulness of the GUI, as the evolution of defect structures in response to user-instigated changes can be seen in real time.
Although numerical simulations of this size have been commonly used to make contact with experiments, in singlecore operation it is impractical to simulate lattices much larger than N ∼ 300 3 , with the limiting factor being the wallclock time required for CPUs and memory constraints for GPUs. Given a simulation with N degrees of freedom and spreading the work across P processing units (either GPUs or CPUs), achieving ideal N /P scaling requires both low-latency communication between processors and algorithms that are themselves linear in N /P. Fortunately, lattice-based models with only nearest-and next-nearest-neighbor interactions are trivial to parallelize using a pattern common to, e.g., spin glasses [87]. We use a standard spatial decomposition of the total number of lattice sites into rectilinear sub-regions (typically cubes, although other spatial partitions are easily implemented and may be preferable for some simulation geometries). Each processing unit is assigned to one of these subregions, and is responsible for controlling and updating the lattice sites in that subregion. It also maintains information about the state of the "halo" of lattice sites that are neighbors, nearestneighbors, and next-nearest neighbors of lattice sites at the boundary of the subregion it controls. Standard OpenMPI protocols [68] are used during each simulation step to communicate information about the state of these halo sites showing the minimized configurations, the three sets of lines correspond to (Blue) two spherical colloids between parallel plates, all with homeotropic anchoring, (Red) the interior of a spherical droplet with homeotropic anchoring, and (Purple) a spherocylinder with homeotropic anchoring between parallel plates with degenerate planar anchoring. These images were created using the "multirankImages.nb" Mathematica file included in the repository for making simple visualizations.
to and from each processing unit in optimized sequences of uni-directional transfers.
We now assess how our method's efficiency scales as the problem size is increased. Although strong scaling (Amdahl's law)-in which the total problem size is kept fixed and P is increased-is often important, it is well-established that the structure of the near-neighbor lattice interactions we simulate is embarrassingly parallel. Our real aim is to scale up the problem size itself and use many processors to simulate lattices that approach experimental scales. As such, weak scaling (Gustafson's law)-in which the amount of work per processing unit is kept constant-is the relevant test.
One challenge to mention here is that when targeting energy minima-as opposed to simply advancing a molecular dynamics simulation for a fixed number of time stepsthe number of minimization steps itself grows with the total system size. In general the convergence properties of different minimizers in non-convex settings are highly nontrivial. For simple geometries we are able to numerically probe this scaling-for instance, we find that in the absence of any boundary the number of minimization steps to achieve a target small force tolerance scales with the linear size of the system, whereas in the presence of a spherical colloid it scales roughly with L 3/2 . In general, though, the approximate scaling may be hard to ascertain (and may depend on the target threshold for declaring a configuration to be in a minimum).
Turning instead, then, to the per-minimization-step timings, we present the weak scaling performance of openQmin in Figure 4, where we compute the total number of lattice-site updates (i.e., N times the number of simulated time steps) during a minimization in which we fix N p , the number of lattice sites per processing unit, at several values and vary P. Consistent with a globally cubic simulation, we parallelized across P = FIGURE 4 | Weak scaling performance of openQmin on Comet, in total number of lattice site updates [i.e., (time steps)×(ranks)×(N p )] per second vs. the number of CPU processes, P, for a constant number of lattice sites per process. The points from dark red to light blue correspond to N p = 75 3 , 100 3 , 125 3 , 150 3 , 250 3 lattice sites per rank. The dashed gray line corresponds to ideal ∝ P scaling. 1 3 , 2 3 , 3 3 , 4 3 , 5 3 , 6 3 , 7 3 , 8 3 , 9 3 , 10 3 processors on the Comet XSEDE cluster, and studied computational performance for N p = 75 3 , 100 3 , 125 3 , 150 3 , 250 3 . As expected, there are systematic drops due to increased communication costs as one goes from 1 core to multiple cores to multiple nodes, but openQmin recovers ideal linear scaling of lattice updates with P as P grows very large. Additionally, there is a systematic degradation of performance for small N p , since in that case there is a more unfavorable ratio of halo sites to controlled sites for each processor.
Note that when we set the characteristic lattice spacing to correspond to 4.5 nm, the largest system simulated in this study, N p × P = (250 3 ) × 10 3 , corresponds to a simulation domain of volume 1,424 µm 3 .

Companion Defects to Homeotropic Spherical Colloids
In this section we apply openQmin to the question of whether a hyperbolic hedgehog or a Saturn ring disclination loop provides the minimum-energy form of the topological companion defect to a homeotropic spherical colloid. As mentioned above, a larger colloid radius a favors the dipolar configuration with a hedgehog, whereas smaller a favors the quadrupolar configuration with a Saturn ring. As a result, the common rescaling of experimental dimensions to smaller a/ξ N in numerical modeling risks obtaining qualitatively different topological defect configurations. Besides increasing the simulation box size, altering the modeled material constants can restore qualitative agreement between experiment and simulation. Here we explore the issue in detail, using openQmin to systematically investigate the stability of hedgehogs relative to Saturn rings over a range of sizes and material parameters.
The dipolar configuration with a hyperbolic hedgehog is the ground state for homeotropic colloidal particles near or above the micron scale [2]. Terentjev's prediction of the alternative quadrupolar director field configuration with a Saturn ring disclination loop [62] can be stabilized for large particles by confinement or external fields [88,89]. Stark [63] demonstrated numerically using the Frank-Oseen free energy that the Saturn ring becomes metastable relative to the dipole for a 720 nm, with a defect core size r c = 10 nm. For a 270 nm, the Saturn ring becomes the global ground state.
While the elastic energies of the two configurations are complicated to express, the Saturn ring is additionally penalized by a simple core energy per unit length, or line tension, γ = πK/8 [63,69]. Because the Saturn ring maintains a radius r d just slightly larger than that of the colloidal particle, r d ≈ 1.1a [63], the total defect core energy penalty E c = 2πr d γ ∝ Kr d of the Saturn ring grows linearly with the colloid radius. In contrast, the hyperbolic hedgehog has no defect core dimension growing in size with the colloidal particle, helping to stabilize the dipole over the Saturn ring at larger colloid sizes.
In order to numerically model multi-particle configurations in the dipolar size regime-if we cannot exploit crystal symmetries to obtain a small unit cell [10,45]-we must either scale up the simulation volume to larger lattices, or stabilize the dipole at smaller particle sizes. We can achieve the latter by altering the materials-constant ratiosB ≡ B/A,C ≡ C/A in Equation (5). Together, these two ratios determine S 0 via Equation (7), as well as the non-dimensionalized free energy density of the nematic ground statef 0 ≡ f 0 /A with the energy well depth f 0 defined as in Equation (8).
By varyingB andC such that S 0 remains fixed, we alter the energetic cost per unit volume of melted nematic order in defect cores, |f 0 |. The defect core size r c varies with the nematic correlation length ξ N , which, from Equation (12), scales as ∼ L 1 /|f 0 |. Thus, an increase in |f 0 | implies a decrease in the defect core size, which means effectively that the ratio a/r c of the particle size to the defect core size is increased without changing the size of the simulation lattice. The dipolar configuration is therefore expected to remain stable at smaller particle sizes. This technique was used in Luo et al. [37] to model a dynamical transition from Saturn ring to dipole as a colloidal particle approaches an undulated boundary, at simulation box sizes up to 50 times smaller than the experimental dimensions.
The results of this study are shown in Figure 5, which we parameterize by varyingB at fixed S 0 = 0.53 [i.e., setting C = (−2 −BS 0 )/(3S 2 0 )], along with the size of the spherical colloid and the lattice size. We test the stability of hyperbolic hedgehogs by initializing the surrounding lattice sites in the dipolar defect configuration suggested by Lubensky et al. [90], performing an energy minimization, and testing whether the resulting configuration has remained in the hedgehog state or transitioned to a Saturn ring configuration (thus, testing the meta-stability of the dipolar defect state as a function of system parameterization). At the valuesB ≈ 12,C ≈ −10 commonly used in modeling of 5CB [47], we find that the lower limit of hedgehog meta-stability is a ≈ 74 lattice spacings, or about 330 nm. In this sample study we have imposed a large but finite anchoring strength at the colloid's surface. Weaker anchoring strength will affect the results, with a "surface ring" configuration replacing the dipole at low anchoring strength [63].
We have also tested the meta-stability of the quadrupolar defect configuration by initializing the system in a Saturn ring configuration and minimizing, but we have not observed the spontaneous appearance of hedgehog defects from such simulations, indicating at least the meta-stability (if not absolute stability) of Saturn rings over the entire parameter range studied here. In addition to the effect of defect core size mentioned above, slight deviations in hedgehog meta-stability as a function of lattice size at fixedB and a seen in Figure 5 indicate the importance of far-field distortion terms on the (meta-) stability of defect configurations.

Patterned Boundary Conditions
To demonstrate the modeling of patterned boundaries in openQmin, we examine a square array of alternating ±1 disclinations imprinted as a spatially varying anchoring direction on a planar substrate. Such an array was created experimentally by the authors of Murray et al. [26], by scribing lines into a polyimide surface with an atomic force microscope. As in that experiment, we give the opposing surface degenerate planar anchoring. In openQmin, these boundary conditions are specified at each boundary lattice site through a user-prepared text file (see section 5 below). We employ periodic boundary conditions in the horizontal directions, and the anchoring strength W at both surfaces is set to make the extrapolation length K/W roughly equal to the lattice spacing. Figure 6A shows the result of minimizing a cell of thickness h = 224 lattice spacings, corresponding to ≈ 1 µm for 5CB, and a spacing d between defects equal to h. We create an 8 × 8 array of defects, so the total volume modeled is 64 µm 3 , larger than the maximum size achievable with single core minimizations on a typical CPU (≈ 10 − 20 µm 3 ). Simulating several unit cells of the substrate patterning in this way allows us to observe a labyrinthine configuration of half-integer disclination lines near the plane of the substrate, connecting neighboring surfacedefects. Meanwhile, some disclination lines are vertical, traveling between the two surfaces and imprinting a + 1 2 or − 1 2 defect profile on the top surface. The stopping condition for the minimization here was a somewhat modest force tolerance, allowing these large-system-size studies to be completed in less than 24 hours. While clearly not completely equilibrated, the horizontal disclination labyrinth is similar to a domain wall texture observed experimentally in Murray et al. [26], which may also be kinetically trapped. Absent from the texture in Figure 6A is the ±1 non-singular escaped configuration, which did appear in the experiments.
The energetic cost per unit length of disclination lines implies that the vertical configuration is favored by smaller cell thickness h. Indeed, as shown in Figure 6B, when we decrease h/d from 1 to 1 6 , only vertical disclinations appear, in pairs of + 1 2 or − 1 2 disclinations from the "splitting" of the ±1 surface defects. This defect splitting was sometimes observed in Murray et al. [26] in place of the escaped configuration. Conversely, as shown in Figure 6C, only horizontal disclinations appear when h/d is increased to 2. Extensions to even larger defect arrays, to curved boundaries, and to spatially non-uniform anchoring types can be explored in the same manner in openQmin. Figure 1 shows a screenshot of the graphical user interface in action; the Supplemental Video and accompanying narrative transcript of the video in Supplementary Data Sheet 1 shows a representative demonstration of its use. Here we discuss some of its current functionality. Initialization dialog boxes allow the user to set the simulation size, the computational resource to use (CPU or GPU, auto-detecting whether CUDAcapable resources are available for use), and parameters for the bulk and distortion free energy density. This generates a random bulk configuration of Q-tensor lattice sites with periodic boundary conditions. For the visualization pane the user can specify the density and magnitude of directors to draw (taken to be the direction of the largest eigenvector of Q at each site), and can freely rotate and zoom in on the configuration, as well as highlight in blue defects defined locally by regions where the largest eigenvalue of Q falls below some threshold.

RAPID PROTOTYPING WITH GUI INTERFACE
In the top left are buttons allowing the user to specify parameters from one of two energy minimization techniques to use (FIRE and Nesterov's Accelerated Gradient Descent); the resulting dialog boxes are populated with values that we typically find to be efficient for default parameter choices in the bulk and distortion energies, although some amount of tuning may be quite beneficial (particularly when changing the distortion terms L 2 through L 6 ). The "Minimize" button performs the requested energy minimization (either until a target force tolerance is attained or the maximum number of iterations is reached), with the option to visualize the results only at the end or to watch the minimization proceed. The "File" dialog box allows the currently visualized state of the system to be saved for separate analysis or processing.
Note that menu items allow any of the terms in the energy functional governing the simulation, Equation (4), to be changed on the fly. This allows, for example, the user to first minimize a system with some values of the distortion constants and then perform repeated minimizations as those values are changed, observing the stability or meta-stability of defect structures as this is done. Two buttons allow the user to introduce boundaries and colloidal inclusions into the system. "Simple" objects are spheres and flat walls with either normal homeotropic or degenerate planar anchoring conditions. Arbitrarily complex boundary conditions (taking any shape, with degenerate planar and homeotropic anchoring conditions not restricted by the direction of the surface normal) can be added by preparing a simple text file that the program can read in -an example script that generates the custom boundary file used in Figure 1 is included in the "/tools" directory of the project's GitHub repository [67].
With boundaries and colloids ("objects") in place, some manipulations of these objects are accessible via drop-down menus. The positions of these objects within the simulation can be directly modified, so the user could place an ellipsoidal particle, perform a minimization, change the position, reminimize the system, and record the different energy minima attained. We include an option to automate this type of operation (which can be used to build up the potential of mean force from the liquid crystal and colloid interactions) for convenience. A near-term addition will be allowing objects to move according to the integrated stresses at their surface (or according to the energetic results of various trial moves); the user will then be able to separately "Minimize" just the liquid crystal sites or "Evolve [the] system" by allowing both liquid crystalline and colloidal degrees of freedom to change simultaneously.
Finally, to facilitate moving from GUI prototyping to largerscale MPI studies, we have included the ability to record system initialization and sequences of commands entered in the graphical user interface, and then save this sequence of commands as a new file that can be separately compiled and executed in non-GUI operation. This file has its own set of command line options, primarily so that it can be made to work as an MPI executable and so that the system size of the simulation it represents can be rescaled to a larger value. We highlight this GUI-prototyping approach as a visual alternative to the scripting-language approaches of molecular-dynamics packages like LAMMPS [91] or HOOMD-blue [92] for specifying complex sequences of system initialization, energy minimizations, and the introduction of objects, fields, and boundary conditions. We believe that this seamless visual-prototyping-to-MPI-scalable pipeline will be beneficial to researchers interested in accessing experimental-scale simulations.

DISCUSSION
As demonstrated in our sample study, openQmin utilizes MPI to enable LdG modeling at typical size scales of experimental relevance, at the ∼ 10 µm range, with fast convergence enabled by the FIRE algorithm. Besides the colloidal defect configurations and patterned boundaries discussed here, another immediate use is for the study of cholesterics, where typically fewer than ten pitches can fit inside a simulation box using a single processor, but using openQmin tens of pitches can be modeled. While it may not be realistic at present to frequently conduct simulations with 10 3 processors, using openQmin on computer clusters will facilitate demonstration of how numerical results scale with system size, allowing reasonable extrapolations to experimental scales.
For modeling at the ∼ 1 µm range or smaller, openQmin's combination of FIRE with GPU computing offers a substantial speedup, enabling users to manipulate the simulated conditions in a GUI environment and observe the change in energy-minimized configurations. The GUI is useful for running "real-time" tests of proposed configurations which can then be modeled at larger scales with MPI.
Likewise, the GUI will also be useful to experimentalists in quickly identifying more optimal properties of nematics, colloidal particles, boundaries, etc. in order to achieve targeted topological or self-assembled configurations. In general, numerical modeling can aid experimental studies not only in developing theoretical understanding of nematic structures and energy landscapes, but also in performing high-throughput searches through these design spaces. For example, geometric compatibility conditions favoring lock-and-key assembly of particles and patterned walls [21,37], or particle design promoting assembly into photonic crystals, can be optimized more efficiently in numerics, to help guide the increasingly sophisticated uses of fabrication techniques such as photolithography and twophoton polymerization [93]. An ambitious but important direction for future development is therefore to efficiently explore design parameter spaces in numerical modeling, possibly employing genetic algorithms and techniques from machine learning.
There are some near-term directions for future development of openQmin that we anticipate will increase the usefulness of this open-source software to the liquid crystals research community. An expanded library of Q-initialization options will facilitate investigations of chiral liquid crystals, topologically entangled or knotted defect configurations [11][12][13][14], and periodic defect arrays [26,94], for example. A major advance would be adding a flow field coupled to Q by Beris-Edwards nematodynamics, for investigations of microfluidic geometries and active nematics.
Incorporating motion of colloidal particles into the modeling is another area for useful developments. In the experimental system, energy is minimized not only over Q but also over the positions and (if applicable) orientations of colloidal particles. At present, openQmin takes these latter degrees of freedom as input parameters, and a free energy landscape can be mapped either informally using the GUI or more systematically on a computer cluster. Thus one desired future improvement is to allow overdamped translation and rotation of colloidal particles within the program, downhill in the energy landscape, based on trial moves or on estimated nematic elastic stresses felt by the particle [46]. The trial move approach, requiring several re-minimizations of Q at each time step, is made less cumbersome by improved convergence speed of the FIRE algorithm.
Finally, we hope that openQmin's GUI interface will be useful in physics education. Interacting with a fast and "hands-on" version of the numerical modeling, students at the undergraduate or beginning graduate level can quickly gain experience and intuition for liquid crystals. This will help to capitalize on the position of liquid crystals as one of the most accessible, and visualizable, physical realizations of abstract topological ideas relevant to many areas of physics.

DATA AVAILABILITY STATEMENT
The open source code described in this work can be found at Reference [67] and used to reproduce all data in the manuscript. Documentation for the software is maintained at Reference [95], and can also be generated with doxygen from the source code.

AUTHOR CONTRIBUTIONS
DS and DB designed the study, developed the open-source software, and wrote the manuscript.