Lipase-Catalyzed Baeyer-Villiger Oxidation of Cellulose-Derived Levoglucosenone into (S)-γ-Hydroxymethyl-α,β-Butenolide: Optimization by Response Surface Methodology

Cellulose-derived levoglucosenone (LGO) has been efficiently converted into pure (S)-γ-hydroxymethyl-α,β-butenolide (HBO), a chemical platform suited for the synthesis of drugs, flavors and antiviral agents. This process involves two-steps: a lipase-catalyzed Baeyer-Villiger oxidation of LGO followed by an acid hydrolysis of the reaction mixture to provide pure HBO. Response surface methodology (RSM), based on central composite face-centered (CCF) design, was employed to evaluate the factors effecting the enzyme-catalyzed reaction: pka of solid buffer (7.2–9.6), LGO concentration (0.5–1 M) and enzyme loading (55–285 PLU.mmol-1). Enzyme loading and pka of solid buffer were found to be important factors to the reaction efficiency (as measured by the conversion of LGO) while only the later had significant effects on the enzyme recyclability (as measured by the enzyme residual activity). LGO concentration influences both responses by its interaction with the enzyme loading and pka of solid buffer. The optimal conditions which allow to convert at least 80% of LGO in 2 h at 40°C and reuse the enzyme for a subsequent cycle were found to be: solid buffer pka = 7.5, [LGO] = 0.50 M and 113 PLU.mmol-1 for the lipase. A good agreement between experimental and predicted values was obtained and the model validity confirmed (p < 0.05). Alternative optimal conditions were explored using Monte Carlo simulations for risk analysis, being estimated the experimental region where the LGO conversion higher than 80% is fulfilled at a specific risk of failure.


Log-Transformation of responses and data normality assessment
In regression analysis, it is advantageous if the data of a response variable are normally distributed, or nearly so. Normal is used to describe a symmetrical, bell-shaped curve, which has the greatest frequency of scores in the middle, with smaller frequencies towards the extremes. The response histogram and the Box-Whisker plot are useful for studying the distributional shape of a response variable and determining the need of a response transformation. Figures S1 and S2 show the response histogram and Box-Whisker plot for LGO conversion and Enzyme residual activity, respectively, before and after negative log-transformation.
Histograms of negatively skewed responses are observed before transformation (Figures S1a and Fig. S2e, being obtained a nearly normal distribution after a negative log-transformation (Figures S1b and Fig. S2f). With this transformation each measured value is subtracted from the maximum value (100, for variables expressed in percentages), and then the negative logarithm is formed. Thus, a negative log-transformation of a response Y i can be expressed as: Figs. S1c, S1d, S2g, S2h display Box-Whisker plots corresponding to the respective response histograms. The Box-Whisker plot comprises a rectangular body, the box, and two attached antennae, the whiskers. In the box, the lowest horizontal line depicts the lower quartile (Q 25% ) and the upper line the upper quartile (Q 75% ). The lower and upper whisker denote the 5 and 95 percentiles of the distribution, indicating the variability outside the upper and lower quartiles. Whenever the two whiskers are of similar length, the distribution of the data is roughly normal.
In addition, the normality of the data after response transformation can be checked by plotting the normal probability plot of the residuals. The normal probability plot is a graphical technique for assessing whether or not a data set is approximately normally distributed. Fig. S3 displays the normal probability plot for the empirical models for conversion of LGO and enzyme residual activity. The straight line in the plot of the residuals represents a normal distribution, supporting the adequacy of the least-squares fit.

MODEL FITTING
When fitting a regression model the most important diagnostic tool consists of the two parameters R 2 (coefficient of determination) and Q 2 (cross-validation).
R 2 measures the goodness of fit, indicating how well the regression model can be made to fit the raw data. When R 2 is 1, a perfect model is obtained where all points are situated on a diagonal of a graph representing the observed values vs predicted. Fig. S4 represents the Observed vs Predicted Plot for both responses: LGO conversion and enzyme residual activity.
A much better indication of the usefulness of a regression model is given by the Q 2 . This parameter measures the goodness of prediction, and estimates the predictive power of the model. Like R 2 , Q 2 has as upper bound 1. Both should be high (> 0.5), and preferably not separated by more 0.2 -0.3. A substantially larger difference constitutes a warning of an inappropriate model. A third parameter is called model validity.
It is based on the lack of fit test carried out as part of the analysis of variance (ANOVA) evaluation. The higher the numerical value the more valid the model is, and a value above 0.25 suggests a valid model. Finally, a last diagnostic tool called reproducibility. This performance indicator is a numerical summary of the variability in replicates. The higher the numerical value the smaller the replicate error is in relation to the variability seen across the entire design. A small value of reproducibility (< 0.5) indicates a large experimental error (aka pure error) and poor control of the experimental procedure.