Improvements to the Shaw-Type Absolute Palaeointensity Method

Palaeointensity information enables us to define the strength of Earth’s magnetic field over geological time, providing a window into Earth’s deep interior. The difficulties in acquiring reliable measurements are substantial, particularly from older rocks. Two of the most significant causes of experimental failure are laboratory induced alteration of the magnetic remanence carriers and effects relating to multidomain magnetic carriers. One method that has been claimed to overcome both of these problems is the Shaw method. Here we detail and evaluate the method, comparing various selection criteria in a controlled experiment performed on a large, non-ideal dataset of mainly Precambrian rocks. Monte Carlo analyses are used to determine an optimal set of selection criteria; the end result is a new, improved experimental protocol that lends itself very well to the automated Rapid 2G magnetometer system enabling experiments to be carried out expeditiously and with greater accuracy.


INTRODUCTION
The acquisition of absolute palaeointensity data is problematic. The minerals that carry magnetic remanence are prone to various types of alteration in nature and in the laboratory. The matrix can also alter to form new magnetic minerals. This alteration affects the original magnetic signal and can lead to inaccurate estimates of past field strength. The pre-requisite that reliable palaeomagnetic directions and ages are obtained from suitable rocks is followed by time-consuming experiments, which often yield low success rates. To aid the process, there are several different methods with which a palaeointensity can be determined. Of these, variants of the thermal Thellier method (Thellier and Thellier, 1959;Coe, 1967;Tauxe and Staudigel, 2004) remains the most robust for assemblages of non-interacting uniaxial SD grains, due to its explicit theoretical basis grounded in the work of Néel (1949). However, it does have disadvantages that may be overcome by using alternative techniques such as the Shaw method (Shaw 1974;Shaw 1994, Yamamoto et al., 2003).
The Thellier method can only be used to calculate a palaeointensity at temperatures below those at which alteration occurs. The primary magnetic remanence of ancient rocks is often carried by grains associated with high unblocking temperatures and any thermo-chemical alteration that may influence the primary remanence during heating in the laboratory is not usually corrected for. The method can also suffer from multidomain (MD) effects associated with a failure of Thellier's laws of thermoremanence, causing non-linear Arai plots, which can lead to erroneous palaeointensity estimates (Levi and Merrill, 1976;Shcherbakov et al., 2001;Paterson, 2011). This can lead to significant problems because of an oft-present multi-slope phenomenon that is not easilyovercome, causing under/over-estimations of the field strength (Thomas and Piper 1992;Xu and Dunlop 2004;Smirnov et al., 2017).
Compared with other palaeointensity methods, the imparted laboratory remanence in a Shaw experiment is somewhat more analogous to that acquired in nature, in that it is acquired during a single cooling from above the Curie temperature rather than during multiple stepwise heating-cooling cycles with lower peak temperatures. In principle, this makes it less prone to the multidomain effects associated with blocking and unblocking of partial thermoremanent magnetisations, and is said to be domain-state independent (Biggin and Paterson, 2014). Repeated measurements (before and after heating) of anhysteretic remanent magnetisation (ARM) are used to correct any physiochemical alteration of the magnetic minerals; this allows palaeointensities to be calculated from a thermal remanent magnetisation (TRM) at higher temperatures that would otherwise be beyond the range of a Thellier experiment. However, it should be noted that heating above the Curie temperature (Tc) can induce alteration affecting the whole specimen. Additional uncertainties are associated with the Shaw method and its subsequent variations; in particular, the use of ARM as an analogous substitute for TRM, given the wellestablished grain size dependency of the ratio of ARM and TRM (Tanaka and Komuro, 2009).
The palaeointensity average for the past 0-5 Ma is also noticeably lower when using the Shaw method compared with the Thellier method according to the PINT database (v.2015.05;http://earth.liv.ac.uk/pint/;Biggin et al., 2015). This discrepancy requires that both methods be scrutinised for any systematic biases that they may introduce.
Here we aim to provide a practical evaluation of several aspects of the Shaw-DHT method using a large dataset of wellcharacterised samples. A base set of selection criteria are applied to all data, after which, two linearity parameters with varying selection criteria are applied to the palaeointensity slope and compared. These include the curvature parameter (|k′|; Paterson 2011;Paterson et al., 2014b) and the R 2 correlation coefficient (R 2 corr ; Paterson et al., 2016). R 2 corr is the square of the Pearson correlation (R corr ) used in other Shaw palaeointensity studies (Yamamoto et al., 2003). The aim is to determine the most effective set of selection criteria and provide a quantitative measure of their accuracy and precision. We also examine any potential effects of varying the high-temperature hold durations used in the experimental heatings.

THE SHAW METHOD
The Shaw method produces a palaeointensity estimate by comparing the alternating frequency (AF) demagnetisation spectra of a natural remanent magnetisation (NRM) with that of a TRM that is acquired in a single step. Alteration is measured by comparing the demagnetisation spectra of an ARM imparted prior to, and following the laboratory heating; the difference is calculated for each coercivity step and applied as a correction to the TRM (Figure 1).
Originally proposed by Shaw (1974), the current method is the product of modifications that allow a palaeointensity to be calculated in the presence of alteration (Kono 1978;Rolph and Shaw 1985). The double heating technique (DHT) was further added (Tsunakawa and Shaw, 1994) with the purpose of testing the reliability of the ARM corrections. This involves a controlled repeat of the initial experiment, replacing the NRM with a laboratory TRM. If subsequent ARM corrections lead to the recovery the TRM, it suggests that the initial corrections are reliable.
FIGURE 1 | Schematic view of the data analyses process for the Shaw-type experiment using a specimen from Mount Etna. Top row is the palaeointensity experiment, where any alteration to the TRM 1 demagnetisation spectra is corrected by multiplying by the ARM slope (ARM 0 /ARM 1 ) to produce TRM 1 *. Bottom row is a repeat experiment to test the validity of the ARM corrections. This ARM validity check uses additional remanence acquisitions (TRM 2 and ARM 2 ) and should produce a unit slope. Note that the values shown in these figures are after vectoral subtraction of the remanences at the maximum AF step.
The Shaw-DHT method (referred to as the Tsunakawa-Shaw method) now includes low-temperature demagnetisation (LTD; Yamamoto et al., 2003), which is said to preferentially remove remanence magnetisation held by MD grains (c.f. Ahn et al., 2016). The validity of the Shaw LTD-DHT method has been recognised in historical lavas (Yamamoto et al., 2003;Mochizuki et al., 2004;Oishi et al., 2005;Yamamoto and Hoshi, 2008) and in archaeological samples (Yamamoto et al., 2015).
The inclusion of LTD treatment is optional with respect to the theory behind the DHT. Recent studies of rocks that were moderately to highly MD (15-28%), obtained very similar palaeointensity estimates from LTD and non-LTD sister specimens (Yamamoto et al., 2007;Lloyd et al., 2021) and other studies successfully use the method without LTD. (Thallner et al., 2021). However, the incorporation of LTD can be important, not only for removing MD-like remanence, but also for the additional information it provides. The remanence loss due to LTD can be measured in the NRM, ARM and TRM steps; this can be used to quantify the size of the MD component, differences in the various remanent magnetisation losses and provide general information on how the LTD influences the coercivity spectra.
Here we outline the full experimental LTD-DHT method; note that the method requires an LTD treatment prior to demagnetising the NRM and TRMs (steps 1,3 and 5). These are then compared with the LTD treated ARMs (steps 2b, 4b and 6b; italics). The procedural steps are as follows: 1) (NRM) Stepwise AF demagnetisation of the NRM, usually up to 100-180 mT. 2a) (ARM 0 ) An ARM is imparted over the full coercivity range that was used in the NRM demagnetisation, and then progressively AF demagnetised using the same steps as for the NRM. The ARM direct current (DC) bias field is usually 2-3 times higher than the TRM DC bias field to compensate for a weaker ARM acquisition. The ARM AF demagnetising field must always equal the maximum AF demagnetisation step of the NRM. 2b) (ARM 00 ) LTD-ARM; the same as step 2a but with LTD treatment prior to AF demagnetisation. 3) (TRM 1 ) A TRM is imparted by heating in a magnetically shielded oven to above Tc (typically to 600 o C for magnetite), with heating and cooling in a constant DC bias field. This is then stepwise AF demagnetised to the same maximum level as the preceding steps. The DC bias field should be close to the expected palaeointensity or a sensible moderate value, i.e., 20 µT. 4a) (ARM 1 ) A second ARM is imparted over the full coercivity range, as before, and stepwise AF demagnetised to the same maximum level. 4b) (ARM 10 ) LTD-ARM; the same as step 4a but with LTD treatment prior to AF demagnetisation. 5) (TRM 2 ) A second TRM is imparted and demagnetised in an identical manner to step (3), usually with a longer hold duration. 6a) (ARM 2 ) A third ARM is imparted and demagnetised in an identical manner to steps (2a and 4a).
6b) (ARM 20 ) LTD-ARM; the same as step 6a but with LTD treatment prior to demagnetisation.
The key parameters obtained from a Shaw experiment are defined as follows: • TRM 1 *(i) TRM 1 (i) × [ARM 0 (i)/ARM 1 (i)]; TRM 1 is corrected at each AF demagnetisation step for the ARM 0 / ARM 1 ratio at the same AF step.
• Slope N (NRM/TRM 1 *) is the palaeointensity slope calculated using the corrected TRM 1 . • TRM 2 *(i) TRM 2 (i) × [ARM 1 (i)/ARM 2 (i)]; TRM 2 is corrected at each AF demagnetisation step for the ARM 1 • ARM 2 ratio at the same AF step Slope T (TRM 1 /TRM 2 *) is the correction validation made using the slope between • TRM 1 and the corrected TRM 2 and should be within 5% of unity.
The AF demagnetisation spectra used for all slopes corresponds to the coercivity range determined as the ChRM, except slope T , which should use the full coercivity range or close to it, but no less than the ChRM.
Heating of samples are often conducted in a vacuum in order to repress high-temperature oxidation during laboratory heatings (Mochizuki et al., 2004). The Shaw LTD-DHT studies after Mochizuki et al. (2004) usually used vacuum heating.
Specimens must be held above the Curie temperature of their remanence bearing minerals for enough time to homogeneously heat the specimen; however, heating durations have been shown to alter the ARM/TRM ratio and potentially affect the palaeointensity estimate (Tanaka and Komuro, 2009) and should therefore be subject to scrutiny.
ARM corrections are performed at each measurement point rather than applying a single correction obtained from the best fit slope of ARM 0 /ARM 1 ; although the slope is a linear fit, the plot of ARM 0 /ARM 1 need not be linear. This means that a unit slope does not necessarily mean that no alteration has taken place. In the results that will be discussed, we find that the curvature observed in slope A (k′ A ) tends to be inversely proportional to the difference in curvature between the corrected and uncorrected palaeointensity slope (Δk′), so that k′ A ≈ -Δk′ (Supplementary Figure S1). This is indicative of the non-linear behaviour of the ARM 0 /ARM 1 data being transferred to the NRM/TRM* plot by the ARM correction.

Monte Carlo Down-Sampling and Testing Linear Parameters
A large dataset of measurements undertaken on 426 individual specimens was compiled from multiple Shaw-DHT experiments, mostly carried out at the University of Liverpool over the last 3 years. Full multi-method palaeointensities are now published on many of these sample sets (Lloyd et al., 2021;Thallner et al., 2021). The experiments were performed using a wide variety of varying input parameters, including differences in the laboratory TRM bias field, ARM bias field, hold durations, sample positions, oven used and number of measurement steps. Most, but not all were heated in a vacuum, and many were subjected to LTD. treatment. Here, we do not test how LTD impacts on the palaeointensity estimates although this is something that could be done in the future.
The specimens ( Table 1) vary widely in lithology and age, from present day lavas and igneous rocks reheated in the walls of a kiln, to Precambrian dolerite dykes and sills extending to more than 1Ga. These ancient samples are non-ideal recorders with various degrees of magnetominerological alteration. Included in the modern-day lavas are Shaw LTD-DHT data available in the MagIC database from a study on the andesitic lavas of Sakurajima (Yamamoto and Hoshi 2008). The deliberately large proportion of extremely non-ideal samples, which includes specimens that were originally determined to be unsuitable for palaeointensity based on rock magnetic results, is meant to ensure that the results represent a worst-case scenario.
A simulated palaeointensity experiment was constructed (similar to Pan et al., 2002) by utilising the data from steps 3-6 of the standard Shaw-DHT method (see method description). A full TRM was imparted on all specimens, resetting their primary remanence and replacing it with a new thermal remanence (step 3 in the method description); importantly, this does not render them all thermally stable, since alteration is still observed in the majority of slopes A 2 (Supplementary Table S2). This laboratory induced TRM becomes the control that we aim to recover by comparing it with an additional, ARM corrected TRM (steps four to six in the method description). If a unit slope is produced when comparing the control TRM with the subsequent corrected TRM, the ARM correction is determined to be successful in exactly the same way as slope T in a standard Shaw-DHT experiment.
The simulated palaeointensities produced here are the original slope T values. The fundamental difference between this and a normal Shaw-DHT experiment is that here, there is no double heating; ARM corrections can be tested directly, whereas in the standard method, slope T is an indirect test of the corrections performed in steps 1 and 2.
We begin by applying a base set of selection criteria to the entire dataset ( Table 2), these include f RESID (Paterson et al., 2016) which ensures that the palaeointensity slope is origin-trending, and the scatter parameter β (Coe et al., 1978), which is used to assess the slope uncertainty. After the base selection criteria are applied, the remaining data are randomly down-sampled ten thousand times with increasing number of specimens (N; starting at N 3), as follows; 1. Three specimens are randomly sampled from the dataset, and this is repeated ten thousand times. 2. Every sampled specimen is analysed to determine a simulated palaeointensity (TRM 1 /TRM 2 *) using their full coercivity spectra. 3. For each set of three randomly sampled specimens, a mean palaeointensity and standard deviation is calculated. This produces ten thousand mean palaeointensities and standard deviations, each from an N of 3. 4. The ten thousand mean palaeointensities are sorted to calculate their 95% confidence limit and then averaged to produce an overall mean palaeointensity for an N of 3. 5. The ten thousand standard deviations are sorted to calculate their 95% confidence limit and then averaged to produce a mean standard deviation. 6. This iterative process is repeated, generating an overall mean palaeointensity and mean standard deviation with their respective 95% confidence limits for each increasing N. The accuracy and standard deviation of the mean is assessed as a function of N.
After we establish a set of results using the base set of selection criteria, the entire process (steps 1-6) is repeated four more times, after various additional selection criteria are applied to the data. R 2 corr is applied with two minima, R 2 corr ≥ 0.990 and R 2 corr ≥ 0.995; |k′| is also applied with two minima, |k′| ≤ 0.2 and |k′| ≤ 0.1. Each selection criterion is added individually to the base set of selection criteria and the results are randomly down-sampled ten thousand times each time, producing a total of five sets of down sampled results for comparison.  These parameters (R 2 corr and |k′|) are used to assess Shaw plot linearity-the importance of strict linearity in the Shaw NRM-TRM 1 * plot has previously been highlighted (Tanaka and Komuro, 2009) and is discussed further in Discussion.
Since all the specimens are thermally reset, the full coercivity range is used for all simulated palaeointensity slope fits; this removes any user bias that can result from the slope selection of a preferred coercivity range. Performance is measured by determining the number of specimens required to achieve an acceptable level (±10% of the correct mean) of accuracy and precision with 95% confidence.
The selection criteria are also compared using the published Sakurajima data (Yamamoto and Hoshi 2008). We are also able to determine how effective the DHT is at rejecting unreliable results in the same study because the expected field strength is known.

Testing the Double Heating Technique
The DHT is designed to provide additional confidence in the Shaw method; however, variations in the heating hold durations may influence the palaeointensity results that are obtained, particularly since TRM/ARM ratios have been demonstrated to evolve with excessive heating time (Tanaka and Komuro, 2009). Here, we examine the effect of varying experimental hold durations to determine if any heating time-dependency exists. To do this, we carried out a Shaw-DHT palaeointensity experiment on a set of 23 specimens, using hold durations that are shorter than usual whilst ensuring that equilibrium above Tc is reached. We then compared the slope T values with sister specimens that underwent longer hold durations.
The samples from this experiment were from seven different sites and two different ages (five dolerite dykes aged 755 Ma and two sills aged 1070 Ma; Supplementary Table S1). The 23 half-inch length cylindrical cores were heated in a vacuum to 610°C and held for 15 min for the acquisition of both TRM 1 and TRM 2 (the second hold would normally be approximately twice as long). The TRMs were imparted using a 20 µT bias field. A third TRM was then imparted on three specimens from one site (MD6) to observe any differences in their AF demagnetisation spectra. The specimens were held at 610°C for 40 min using a 20 µT DC bias field.
Sister-specimens of the specimens used in this experiment yielded reversible high-temperature susceptibility curves (Supplementary Figure S2) and high-quality thermal Thellier results with a narrow distribution of remanence-bearing singledomain magnetite grains (Supplementary Figure S3). Their well-defined mineralogy and characteristics make them suitable to use in this comparison of hold duration effects.

Whole Data Set Results
Results obtained after applying all of the selection criteria to the entire data set of 426 specimens (without down-sampling) are given in Table 3 and Supplementary Table S2. These simulated palaeointensity results are exactly equivalent to the slope T values in the original experiments. Irrespective of the selection criteria used, an accurate mean palaeointensity is obtained (0.99-1.0). The standard deviations of results, however, differ considerably. Overall standard deviations range from 21% using only the base selection criteria to 9% for results with the additional criterion |k′| ≤ 0.1.
Using |k′| with a threshold of ≤ 0.2 produces improved results to that of R 2 corr ≥ 0.995 and notably improved over the results from R 2 corr ≥ 0.990, which is widely used in current analyses ( Table 2; Figures 2, 3). A further improvement is observed when decreasing the curvature minimum to ≤0.1, however, given that the difference is small, and its use may potentially discard specimens unnecessarily, we consider that a minimum acceptable curvature of ≤ 0.2 is optimal.
The results in Table 3 all have a base set of selection criteria ( Table 2) applied to them, which remove 24 of the 426 results. In Figure 2, we compare the two linearity parameters with selected minima (R 2 corr ≥ 0.990 and |k′| ≤ 0.2) on the full dataset of 426 specimens. No base selection criteria are applied in this comparison of the two parameters; therefore, the values differ slightly from those in Table 3. Use of the single criterion |k′| ≤ 0.2 appears to be more effective at rejecting non-ideal results, producing a mean result of 1.00 ± 0.12; this is compared to the criterion R 2 corr ≥ 0.990 which rejects fewer specimens and produces a higher standard deviation (mean 1.01 ± 0.17).

Down-Sampling Results
The Monte Carlo down-sampling also produced accurate mean palaeointensities, regardless of the number of specimens averaged ( Figure 3A; Table 3 and Supplementary Table S3). All selection criteria yield an accurate mean intensity, but with varying scatter (standard deviation) and number of accepted results. Of the four sets with additional linearity checks, those that assess curvature yield the lowest result scatter (∼9-11%), while retaining 70-80% of the pre-screened results. When assessing plot curvature, we also observe consistently smaller numbers of specimens required to obtain a mean palaeointensity estimate that has a confidence interval within 10% of the mean. That is, using |k′| ≤ 0.2 or |k′| ≤ 0.1, in addition to the base selection criteria can yield a more precise palaeointensity estimate from fewer specimens (four to six specimens) than using the R 2 corr (7-15 specimens). Furthermore, we observe that with our proposed selection we have a much smaller 95% confidence interval around the mean standard deviation ( Figure 3B) which indicates that we can obtain a more constrained estimate of the data scatter than with other selection criteria.

True Palaeointensity Comparison
The selection criteria were also tested on the original Shaw LTD-DHT palaeointensity results from Sakurajima. The single linearity criterion (|k′| ≤0.2) was compared with the main two current Shaw-DHT selection criteria, (R 2 corr ≥ 0.990 combined with slope T 1 ± 0.05; Yamamoto and Hoshi 2008). All specimen results pass the R 2 corr and |k′| criteria; the only difference is due to the slope T criterion, which causes six specimens to be rejected in the original published data ( Figure 4A). Intensity error fractions for all but one of the rejected palaeointensity results are less than 8% and are closer to the expected geomagnetic field intensity than many of the successful results ( Figure 4A). The results obtained using the single criterion |k′| are slightly improved over the current Shaw-DHT selection criteria (without the use of a second heating) in terms of the number of successful results, the mean palaeointensity result, and the standard deviation ( Figure 4B).

Double Heating Technique Hold Time Results
As part of the original palaeointensity experiments that were carried out on the specimens in this study, one Shaw-DHT experiment was carried out on a subset that included specimens from seven different sites and three localities ( Figure 5A). The experiment was designed to explore the effect of varying hold durations on the palaeointensity estimates and associated slope T values. The effect of shorter hold durations on some slope T values were not expected and are highlighted here. The slope T values obtained were unusually high for several, but not all, specimens and they appear to cluster according to site ( Figure 5A).
We compared the high slope T results from site MD6 with sister specimens that were subjected to longer hold durations, and noted that the values are affected by the hold duration used ( Figure 5B). Sister specimens that underwent shorter hold durations (those in Figure 5A) produced high slope T values. This is in contrast to sister specimens with a combined hold duration of 80 min, which produced a near unit slope T ( Figure 5B; Supplementary Figure S4).
It has recently been noted that Slope T values may be dependent on hold duration (Lloyd et al., 2021); this is now further supported in more detail here. To analyse the cause of these latest observations, we subjected three of the specimens from site MD6 to a third TRM, heating to 610°C using a hold duration of 40 min and compared the AF demagnetisation spectra. In Figure 6 we show that a large decrease in the magnitude of TRM 2 is the cause of the high slope T value in all three specimens, and this was brought about by the corresponding short hold duration ( Figure 5B).
Alteration appears to have continued to affect TRM during the acquisition of TRM 3 , but has caused a reversal of TRM magnitude which finishes close to the original TRM 1 position. The changes in demagnetisation spectra ( Figure 6) infer that the heating would cause a change in slope T (TRM 1 -TRM 2 * plot) depending on the time spent at high temperature. This result is in agreement with specimens from site MD6 that had longer hold durations and unit slope T values ( Figure 5). It is also worth noting that ARM appears to be unaffected by the apparent alteration to TRM.

Summary of Results and Significance
Notably improved results are achieved by replacing the R 2 corr with the curvature parameter |k′|. This is because it provides a more direct measure of linearity, the fundamental characteristic that should be tested for in the Shaw-type palaeointensity slope. A non-linear ARM-  Table 2) are applied before we plot the data, therefore, the values in these figures differ slightly to those in Table 3. Green circles, accepted results at the selected minima; red circles, rejected results at |k′| ≥0.2 and R 2 corr ≤0.990. A few rejected results are not observed as they are outliers, falling outside the selected scale.
Frontiers in Earth Science | www.frontiersin.org July 2021 | Volume 9 | Article 701863 6 corrected Shaw palaeointensity slope suggests that changes in ARM are not behaving as an analogous substitute for TRM changes, most likely due to non-uniform high-temperature alteration affecting certain grain sizes (coercivity ranges) more than others. ARM/TRM ratios have previously been shown to differ with grain size and heating times (Tanaka and Komuro, 2009). The R 2 corr parameter does not perform as well because it has a strong dependence on the number of points selected in the best-fit slope, which allows for increased curvature for a larger number of points. It can also be sensitive to random noise, which can decrease correlation despite a general linear trend in the data. In this sense, R 2 corr measure both NRM-TRM reciprocity (i.e., linearity) as well as data scatter. We argue it is more appropriate to use two criteria more attuned to testing for reciprocity and scatter separately. For example combining curvature (|k′|) with the β parameter; The ratio of the standard error of the slope to the absolute value of the slope (Coe et al., 1978;Paterson et al., 2014a). These data pass the β minimum as part of the base selection criteria which suggests that the noise is acceptable.
The results based on using |k′| as a primary selection criterion are improved over existing data selection. The notable improvement in accuracy and precision when using |k′| ≤ 0.2 instead of R 2 ≥0.990 does not come at the expense of a lower success rate; these remain similar at 350 and 329 respectively from a potential 426; this is because the R 2 corr criterion is rejecting more of the accurate palaeointensities than the |k′| criterion. The improved results are also obtained without a second heating (viewed as though the simulated experiment data were from an initial heating) and therefore, without using the DHT.
With the use of modern equipment such as a Rapid 2G, this new version of the method, with improved palaeointensity slope selection criteria (|k′|, f RESID , β) and without the DHT, can be almost fully automated and able to produce as many as 100 palaeointensity and directional results per week using high resolution AF steps, while other more time-consuming methods may take up to ten times longer.

Observations on the Double Heating Technique
The validity of the DHT for natural rocks has been discussed by Tsunakawa and Shaw (1994) using historical or young lavas, where it was found to detect non-ideal behaviour or non-unity in FIGURE 3 | Plots of the down sampled results with increasing N, shown up to N 21. (A) Mean palaeointensity (dots) are the (N) mean of ten thousand randomly sampled specimens, and associated 95% confidence interval (solid lines) results after applying each set of the tested selection criteria. (B) Mean standard deviation (dots) is the (N) mean of ten thousand randomly sampled specimens and associated 95% confidence interval (solid lines) results after applying each set of the tested selection criteria.
Frontiers in Earth Science | www.frontiersin.org July 2021 | Volume 9 | Article 701863 the slope of TRM 1 -TRM 2 * plots. The results from such specimens were viewed as unreliable because they could potentially give an erroneous slope in the NRM-TRM 1 * plots (Tsunakawa and Shaw, 1994;Yamamoto et al., 2003).
The initial findings presented in this study, however, suggest that it may be possible for the DHT to allow alteration to occur undetected in certain samples and that slope T values could potentially be dependent on the hold duration used (c.f. Lloyd et al., 2021). This is highlighted in FIGURE 4 | Palaeointensity results of the two volcanic lava flows from Sakurajima, (Yamamoto and Hoshi 2008). IGRF values are 45.7 and 46.0 µT. (A) All individual palaeointensity results using the current Shaw (LTD.)-DHT selection criteria. Results separated according to Yamamoto and Hoshi (2008). The only rejected results are due to a failure of the slope T criterion (highlighted in red). IEF (%) is the intensity error fraction m− µ µ X 100 ; m, estimated mean geomagnetic field intensity; µ, expected geomagnetic field intensity determined from IGRF-10 data. (B) A comparison of the results when applying the same Shaw-DHT selection criteria; R 2 corr ≥ 0.990 and slope T (sT) 1 ± 0.05 (left) versus |k′| ≤0.2 (right). The blue horizontal line represents the mean expected palaeointensity according to IGRF-10.  Figure 4A; these are compared with sister specimens from other Shaw-DHT experiments which had longer total hold durations.
Frontiers in Earth Science | www.frontiersin.org July 2021 | Volume 9 | Article 701863 8 Figure 5B where sister specimens with varying hold durations produced very different slope T values.
TRM and ARM can both be affected independently by alteration (Tsunakawa and Shaw, 1994;Tanaka and Komuro, 2009), which is demonstrated by the existence of non-unit slope T values. The magnitude of their remanent magnetisation can also increase and decrease in the same experiment however ( Figure 6). These results are preliminary, and require further study to understand the cause of, and extent to which the observed phenomena can occur; however, the reversibility of remanent magnetisation magnitudes after prolonged heating may not be uncommon. We often see results where slope A1 and slope A2 are opposite in magnitude, where the ARM slopes are less than and more than one, or vice versa (e.g., Yamamoto and Tsunakawa, 2005;Yamamoto et al., 2007); this implies that the ARM magnitudes are reversing. If slope A1 (ARM 0 /ARM 1 ) < 1, the heating has caused an overall increase in ARM over the coercivity steps used; if then, the slope A2 (ARM 1 /ARM 2 ) > 1, the ARM magnitude is altering in the opposite sense during the second heating. It is also possible that this occurs undetected in a single heating.
The cause of reversing magnitude in TRM and or ARM, which can occur in either remanence independently, is unclear. The loss of TRM observed in Figure 6 may be due to the oxidation of magnetite to hematite, however, this does not explain the subsequent increase in TRM 3 . It is possible for new minerals to form that will acquire an increased TRM, but with coercivities too high to contribute to the ARM. This may account for the slight increase in residual magnetisation observed in TRM 3 (Supplementary Figure S5), but does not explain the reversing magnitude of the full TRM, which appears to require more than one mechanism.
A requirement of the Shaw-DHT method is that sufficient alteration occurs in the second heating in order to validate the initial ARM corrections. It is this second alteration correction that provides an indication of whether ARM and TRM are acting as analogues within the specimen. There are currently no criteria to prevent a specimen from becoming thermally stable during the first heating (alteration has saturated), which is inferred by a non-unit slope A1 combined with a unit slope A2 (e.g., Yamamoto and Tsunakawa 2005;Yamamoto et al., 2007). Where a specimen experiences little to no initial alteration, it should be expected to behave similarly in the subsequent heating, in which case, both slope A1 and slope A2 should be close to unity.
The technique also tends to remove accurate palaeointensity results, rather than outliers in the original Sakurajima palaeointensity study ( Figure 4A); in this case it appears to say little about the accuracy of the palaeointensity estimate. It should be acknowledged, however, that any criterion designed for rejecting inaccurate estimates, can reject some of the accurate estimates.
An alternative and arguably more useful technique to test the validity of the alteration corrections would be to vary hold durations with sister specimens during their one and only heating. This enables a more direct assessment of alteration effects and ARM corrections at the palaeointensity level of the experiment, rather than the assessment coming from separate ARM corrections after further and excessive heating. Variations in palaeointensity values from sister specimens can be quantified and large standard deviations can be rejected.

CONCLUSION
We have identified a set of improved selection criteria for the Shaw-type palaeointensity method by down-sampling simulated palaeointensity results from a large dataset of 426 individual specimens. Use of the improved selection criteria demonstrate notably increased accuracy and precision of mean palaeointensity results. This comes with only a minor reduction in the number of successful results and, importantly, without the use of a second heating. We also highlight an additional measure as a safeguard to detect undesirable behaviour that requires varying the hold duration in sister specimens; however, the exact usage and interpretation of this approach requires further investigation.
We find that the DHT may allow alteration to occur undetected, and that slope T appears to be dependent on the hold durations used in certain instances. In the analysis of historical lavas from Sakurajima, the effectiveness of the DHT was not demonstrated. We therefore suggest that a larger scale study of the DHT efficacy is required. The results suggest that it may not be necessary to include the DHT, however, more work is FIGURE 6 | AF demagnetisation spectra of three specimens (6.1A, 2A and 3C) from site MD6. These are vector subtracted as used in the calculation of the slopes (for information on residual magnetisations, see Supplementary Figure S5).