ImageMech: From image to particle spring network for mechanical characterization

The emerging demand for advanced structural and biological materials calls for novel modeling tools that can rapidly yield high-fidelity estimation on materials properties in design cycles. Lattice spring model (LSM), a coarse-grained particle spring network, has gained attention in recent years for predicting the mechanical properties and giving insights into the fracture mechanism with high reproducibility and generalizability. However, to simulate the materials in sufficient detail for guaranteed numerical stability and convergence, most of the time a large number of particles are needed, greatly diminishing the potential for high-throughput computation and therewith data generation for machine learning frameworks. Here, we implement CuLSM, a GPU-accelerated CUDA C++ code realizing parallelism over the spring list instead of the commonly used spatial decomposition, which requires intermittent updates on the particle neighbor list. Along with the image-to-particle conversion tool Img2Particle, our toolkit offers a fast and flexible platform to characterize the elastic and fracture behaviors of materials, expediting the design process between additive manufacturing and computer-aided design. With the growing demand for new lightweight, adaptable, and multi-functional materials and structures, such tailored and optimized modeling platform has profound impacts, enabling faster exploration in design spaces, better quality control for 3D printing by digital twin techniques, and larger data generation pipelines for image-based generative machine learning models.


Introduction
Materials with complex geometry and multiple constituents can be difficult to predict the mechanical properties, such as elasticity, plasticity, hysteresis, and fracture. The properties are usually coupled with the structure and topology of materials, and in many cases change under different boundary conditions. Classical solid mechanics are highly accurate if the assumptions of homogeneity and small deformation are practical. Problems arise when materials become nonhomogeneous and undergo large deformation. Multiple assumptions and parameter fittings are often required, engendering intensive computational cost and prolonged calibration. In particular, many biomimetic and bioinspired motifs involve complex structures and composite materials by design (Wegst et al., 2015). For example, birds have hollow and pneumatized bones for avian purpose. Inside the dense and thin exterior, there are hollows with internal reinforcing structures, including ridges, struts, and foams (Sullivan et al., 2017) (see Figure 1). These complex structures pose nearly insurmountable challenges to continuum approaches (such as finite element methods, FEMs) since the number of elements required by sufficiently detailed characteristics increases dramatically. To resolve the computational intractability of ultra-high mesh models, homogenization techniques are often adopted to replace the materials at smaller scales by the equivalent larger ones (Roters et al., 2010). The degrees of freedom therefore decrease correspondingly in favor of the computing capability and desired scalability. However, the process of homogenization inevitably losses information content and geometry details, leading to inaccurate evaluation on the mechanical properties. In this regard, a different perspective is necessary for describing the materials in an efficient approach but without much loss of details.
Lattice spring model (LSM) has been proved to be an effective tool to predict the elasticity, plasticity, and fracture behaviors of metals (Buxton et al., 2001;Chen et al., 2014), osteon-inspired cellular composites (Libonati et al., 2017), and geometrically toughened structural composites (Chiang et al., 2020;Tsai et al., 2021). The underlying physics of LSM is simply the truncated potential of springs between particles (or beads), which are the representative volume element (RVE) of the discrete material bodies. The fracture occurs when the length of elongated spring exceeds the critical length, leading to the spring breakage and the release of stored elastic strain energy. This straightforward criteria has provided predictive insights in the fracture behaviors of many brittle materials.
One of the most popular codes for large-scale particle dynamics simulations is LAMMPS (http://lammps.sandia.gov). The current acceleration of LSM calculation in LAMMPS package relies on spatial decomposition rather than spring list (bond list) parallelization. Parallelization on large spring list is crucial for LSM acceleration since the spring force calculation is the major bottleneck of time integration. Each particle in the middle of triangular packing lattice, for example, has six springs connected with its first nearest neighbors. To enhance the performance of LSM simulation, we develop a CUDA-enhanced lattice spring model code (CuLSM), which implements GPU parallelization on particle and spring lists. CuLSM provides a great speedup for large particlespring networks with tens of thousands of particles and springs. This work as well as associated codes is important for future large-scale LSM simulations where a large number of particles are necessary to provide enough resolution for biological/biomimetic geometries and complex physical phenomena such as stress concentration, shielding, and plastic zone.
Here we present a handy platform to evaluate the mechanical properties of biological or biomimetic materials design based on 2D image geometry and prescribed materials constants. The images can be obtained from microscopy, computed tomography scan (Bibb et al., 2011;Liang et al., 2009), or other imaging methods and artificial design (such as generative adversarial networks) and can be converted into different types of particles based on the gray-scale pixel values. We report an image-particle conversion tool-Img2Particle, which takes the image and number of particle types as input, and outputs the triangular packing particle model with boundary and notch for mechanical characterization. CuLSM subsequently performs displacement-control mechanical test to determine the mechanical properties. System energies, particle trajectories and other derived attributes are computed using various parallelism scheme. The platform provides reliable pipeline from image to mechanical properties and meanwhile achieves high-performance speedup compared with CPU-centered programs.

Force calculation
Consider two particles i, j connected by a harmonic spring of stiffness k. The potential energy (elastic strain energy) stored in the spring can be expressed in a function of two particle coordinates r i and r j . For a system with N particles and M springs, the total potential energy is where r ij = r i − r j and r 0 ij are the instantaneous length and equilibrium length of the spring between particles i, j. The force exerted on the individual particle i can be obtained through the gradient of potential , where f ji is the force applied by the spring (i, j) on the particle i.

Velocity Verlet integration
Velocity Verlet integration is used to solve the second-order ODE of Newton's equation of motion F = mẍ. One Verlet integration iteration contains three subroutines. First, given positions x, velocities v as well as accelerations a of all particles at time t, the positions at the next timestep t + ∆t are calculated as . Second, the accelerations at the next timestep are obtained from the forces using the configuration at the next timestep x(t + ∆t).
. Third, the velocities at the next timestep are then updated as . In code implementation, we use the half-step velocity scheme to further reduce the memory usage of acceleration vectors. Velocity verlet integration has been proved to be numerically stable and possess important properties for physics such as time reversibility.

GPU parallelization
Instead of using spatial decomposition which requires prior knowledge of particle coordinates and multiple CPU threads to divide entire domain into several computing subdomains, this work applies GPU parallelization to the force calculations of spring list. By doing so, the algorithm focuses on the pair relations between particles connected by springs regardless of their separating distance.
The method has a merit that the examination of particle coordinates is unnecessary and therefore accelerates the computing speed.
Simulations are implemented by the in-house CUDA C++ code CuLSM on a desktop with Intel i5-8400 and Nvidia GeForce GTX 1060. First, vectors of positions, velocities, and accelerations of all particles are copied from host to device memory. All of the subsequent boundary displacement and velocity Verlet integration are executed on the device, with periodic callback copying from device to host when the output of particle states are needed. Five GPU kernel functions for boundary displacement, updating position, calculating force, updating acceleration, and updating velocity are implemented at each timestep controlled sequentially by CPU.
Positions, velocities, and accelerations vectors of all particles are flattened into 1D array and assigned continuously in both host and device memory. 1D block in 1D grid is used, and the block size is fixed as 256 for both particle and spring list. The grid size is dynamically allocated according to the model size of LSM. Figure 2 shows the computing flowchart in CuLSM. In the preprocessing stage, the initial particle-spring network is constructed from the desirable geometry. Particle masses and spring parameters are then assigned according to their specific types. After the model is constructed, the boundary conditions and simulation configurations are set. At this stage, CuLSM has read model input, boundary conditions, and simulation configurations and has stored the data in host memory.
Before simulation starts, particle and bond vectors are copied from host memory to device memory.
At each timestep, boundary displacements are first applied using a GPU kernel function. calculation, the race condition emerges when multiple spring forces try to access and add particle forces at the same time, leading to memory conflicts and unexpected results. Thus, the atomic operation is used to serializing the requests (access and addition) from threads across the entire grid. The particle forces are first set as zeros and then summed over spring forces using atomicAdd function, as shown in the following code. Once the condition for simulation output is satisfied, particle position and velocity vectors are copied back from device to host memory. The spring stiffness vector is also copied for calculating potential energy. The system potential energy and kinetic energy are calculated on CPU. The green blocks are implemented by GPU kernels, which parallelize particle and spring vectors. At each iteration, timestep is checked if satisfying the conditions for callback or termination.

CuLSM demonstrates strong validity against analytical and numerical results
We first compare the trajectory of a simple harmonic oscillator solved numerically by CuLSM with the analytical solution. For a system consisting of two particles with mass m = 1 kg connected by a harmonic spring with spring constant k = 1 × 10 −4 N/m and equilibrium distance r 0 = 10m, the equation of motion is a second-order ordinary differential equation: . We fix one particle at the origin x = 0m and place another one still at x = 3r 0 /2m when time t = 0s, as shown in Figure 3A. The time integral interval δt for CuLSM is set as 1s. The simulation was run for 1000s and the output interval is 10s. As depicted by Figure 3B, our model provides an accurate numerical solution for a simple harmonic oscillator without error accumulation over time. We also test our code against LAMMPS (3 Mar 2020, stable release) and compare the performance in the next subsection. As illustrated by Figure 4A, we construct a series of 2D composite materials with the soft inclusions arranged in a Poisson distribution (Chiang et al., 2020). Three kinds of linear fracture springs, including stiff-stiff, soft-soft, and stiff-soft springs, are used to model stiff-stiff 2.00 × 10 −3 10 10.1 soft-soft 2.00 × 10 −5 10 11.0 stiff-soft 1.25 × 10 −4 10 10.4 stiff, soft and interfacial materials ( Table 1). The stiff, soft, and boundary particles are marked as dark blue, light blue, and red, respectively. To model the mode-I fracture behaviors, boundary particles were displaced apart along x axis at the strain rate of 10 −6 . The size of composites increases from 1000 × 1000 to 2000 × 2000 squared unit length, with area ratio ρ A linearly increasing from 1.0 to 4.0 ( Table 2). The uniaxial tensile tests are performed to validate the results by CuLSM against those by LAMMPS. As shown in Figure 4B, the potential and kinetic energies computed by CuLSM perfectly coincide with those computed by LAMMPS before the peaks of potential energies. We also note that the potential and kinetic energies increase as the size of Poisson composite become large. Small energy discrepancies at large strain are observed, but the tendencies are similar. We further compare the fracture patterns obtained from CuLSM and LAMMMPS ( Figure 5). Regardless of the size of the composites, CuLSM and LAMMPS yield akin fracture patterns. The cracks nucleate, propagate, and bifurcate at strikingly similar locations in CuLSM and LAMMPS series, proving strong fidelity of CuLSM. CuLSM reads input of particle geometry from LAMMPS Data file formatted in bond atom style. During simulation output, CuLSM outputs particle coordinates in LAMMPS Dump file. The outputted files are readily readable and operable by visualization tools such as OVITO (Stukowski, 2009).
In Figure 6, we present virial stress σ V (Subramaniyan and Sun, 2008;Thompson et al., 2009) and Lagrangian strain L (Shimizu et al., 2007) fields of Poisson composite at bulk engineering strain = 0.02: , where Ω is the finite domain volume considered, x, v are particle position and velocity, f is the force between particle pairs; J is the locally affine transformation matrix considering the relative displacement of particle with its first nearest neighbors, and δ is the Kronecker delta. The result indicates that the discrepancies of stress and strain fields calculated by CuLSM and LAMMPS are negligible.

Benchmarks of CuLSM acceleration
To benchmark the performance of CuLSM, we record the computing time of mode-I fracture simulations on Poisson composites of different sizes, as listed in Table 2. In Figure 7, we compare the total wall time of simulations by CuLSM (1 CPU + 1 GPU) and LAMMPS with 1 CPU, 2 CPUs,  slower than CuLSM. With these settings, LAMMPS is unfavorably slow and spatial decomposition scheme is incapable of accelerating the LSM simulation efficiently. Note that LAMMPS does not currently support GPU acceleration on bond potentials. Therefore, LAMMPS 1 CPU + 1 GPU shows no speedup compared to LAMMPS 1 CPU. With communication cutoff (r comm = 4r 0 ) and turning off the neighbor list update (T n = ∞), the total wall time of LAMMPS scales in the same order with CuLSM with respect to particle number. CuLSM can be up to 4.4 times faster than LAMMPS with 1 CPU and around 1.5 speedup compared to LAMMPS with 4 CPUs. CuLSM-CPU with 1 CPU has comparable speed with LAMMPS with 2 CPUs. Note that the optimal neighbor setting depends on the simulation cases for the spatial decomposition scheme. The GPU speedup of CuLSM, i.e. the speedup of CuLSM 1 CPU + 1 GPU against CuLSM-CPU 1 CPU, is also presented in the bottom panel of Figure 7. On the machine with Intel i5-8400 and Nvidia GeForce GTX 1060, the GPU speedup of CuLSM is about 2.5. CuLSM reduces total wall time (including performance results from the parallelization on particle and spring lists. The input files for all the benchmarks and more information can be found online at the link in Data Availability Statement.  In this work, we present a CUDA C++ code CuLSM for large-scale LSM simulations. By realizing the parallelism on particle and spring lists, CuLSM has been optimized for LSM simulations and secures a remarkable boost in computing speed in comparison with general-purpose LAMMPS package. Since all of the interactions in LSM are harmonic pair potentials, the speedup of spatial decomposition used in LAMMPS is limited. Without updating neighbor list during simulations, CuLSM remarkably accelerates the time integration on GPU and only copy data from device to host when needed.
Currently, the broken springs are not deleted from the spring list but are irreversibly assigned zero stiffness to emulate the free deformation. We deliberately retain these broken springs since in future studies the stiffness may need to recover when the materials is subject to compression, bending, and cyclic loading. Indeed, different spring properties and mechanical elements, e.g., nonlinear elasticity, viscocity, and plasticity, are worth being added into the particle-spring networks.
Multi-body potentials such as angle potential and volume-compensated particle method (Chen et al., 2014) are also of interest in future studies. More in-depth theoretical formulations are required for investigating high-level phenomena such as dislocation, Bauschinger effect, and yield surface evolution.
CuLSM is readily extensible to multi-GPUs and can be further incorporated with multithread environment for the larger and three-dimensional models. To further reduce the memory copying time between host and device, the unified memory can be used to allocate memory address space accessible from any CPUs and GPUs in the system. Moreover, the parallel reduction scheme can be adopted to speed up the calculation of the global attributes, such as potential and kinetic energies.
The future opportunities this work brings include: • The toolkit we developed provides a faster and more flexible structure-properties platform, which expedites the particle-based simulation and materials design procedures.
• CuLSM opens the venue for high-throughput and high-fidelity data generation to meet the increasing need for machine learning aided materials design protocols (Kim et al., 2021;Sui et al., 2021).
• The work largely reduces the computational cost for predicting elasticity and fracture behaviors of complex materials systems, further accelerating the design phase through offering predictive insights for additive manufacturing and mechanical experimentation.
• The LSM simulations provide rich and detailed geometric and topological data where the relationship with high-level mechanical properties underlies. One future research direction would be finding out the physics rules from local to global hierarchy that govern the macroscopic behavior of structural materials.
In conclusion, we provide a powerful and efficient framework to characterize and predict the elasticity and fracture mechanism of materials.

Author contributions
YC conceived the idea and developed the research. YC coded the program and performed simulations with advice from TWC. YC wrote the manuscript with input and advice from TWC and SWC.