Near-infrared Spectroscopy Applications in the Quantitative Determination of Bovine Genomic DNA Content from Milk

This study was to provide a convenient method for the determination of DNA concentration from milk. Near infrared (NIR) spectroscopy technology with orthogonal experiment method was conducted to determine bovine genomic DNA content in milk for the first time. DNA was extracted from milk samples with a novel method that was constructed based on our pre-experiments results. All samples were scanned (570-1850 nm), and 74 samples were used to establish the model of DNA content in NIR analysis. The results showed that the optimal model's performance for DNA content was excellent, with SEC = 0.26%, SECV = 0. 69%, SEP = 0. 28%, RSQ = 0.913 and R = 0.9846, RPD = 3.57 respectively. Thus NIR was feasible to nondestructively determine DNA content in milk. Therefore, NIR technology would be valuable in meeting with the increasing demand for monitoring the dairy food chain and ensuring labeling authenticity to the consumers.


Introduction
As a basic biomolecule in organism, genomic DNA can lay a good foundation on identifying the dairy food chain traceability features by milk nutrition, dairy genetic variation, breeding selection with molecular marker, genetic engineering, and so on [1,2] . DNA is a large molecule made up of smaller units called nucleotides. Each nucleotide has three parts: a sugar molecule, a phosphate molecule, and base pairs which contains hydrogen bonds such as C-H, N-H and O-H bonds. The components of a material including C-H, N-H, O-H and S-H bonds can be determined based on the selective absorption of electromagnetic waves, and the spectral signature of the material is defined by the absorbance as a function of wavelength [3] . Thus DNA can be detected by near infrared technology. Near infrared (NIR) spectroscopy is a spectroscopic method that uses the near-infraredregion of the electromagnetic spectrum (from about 800 nm to 2500 nm). Typical applications include agriculture [4] , chemical industry and environmental science [5,6] , assessment of fruit and vegetable quality [7,8] , and determination of meat products quality [9][10][11][12] . Using NIR, many substances such as fat, protein, lactose, water, and melamine have been successfully detected in milk [13][14][15][16][17] . Meanwhile, somatic cell count (SCC) in milk can also be determined by NIR [18] , based on changes of lactose and protein in milk. Theoretically, NIR can determine genomic DNA content from milk based on changes of SCC, yet no study has not been reported. Using NIR to determine DNA content may have many advantages compared with traditional technologies. For example, many nutritive indexes can be detected by NIR through one spectrum scan in a few minutes, while not degrade samples for other purposes [19] . NIR is low-cost, with little or no sample preparation, thus has been considered as an alternative to complement conventional milk analytic methods [20,21] .
Current methods of extracting DNA from milk were complex and difficult, including three main steps: 1) separat and enrich somatic cells from milk; 2) digest somatic cells and extract DNA using sodium dodecyl sulfonate phenol method; 3) detect bovine genomic DNA. Also the minimal volume of somatic cells in the blood is about 350 times more than that in the same volume of milk [1,22] . So the bovine genomic DNA was extracted from milk rarely. However, the difficulties were overcome in pre-researches and DNA extraction method was successfully established [23] . But the DNA extraction method was not suitable for continuous large-scale operations. However, using NIR to detect DNA content could omit the process of DNA extraction and realize continuous large-scale DNA content detection.
In this paper, NIR was applied to determine DNA content in milk. Results of this study would be valuable to detect DNA content rapidly and meet with the increasing demand for monitoring the dairy food chain.

Samples Collection and DNA Extraction
Milk samples were collected from 80 healthy adult Chinese Holstein cows in 15 ml centrifuged tubes, and immediately placed on ice.
DNA was extracted by a novel method, which was a combination of separation of milk somatic cell techniques and sodium dodecyl sulfonate phenol, and the DNA was purified with a repetitive method in the step for adding equal volume of chloroform and isoamyl alcohol mixtures (volume ratio is 24:1) [23][24][25] . The ultra-low volume spectrometer (Ultra low volume spectrometer, Bio-Chrom, Cambridge, UK) was used to determine bovine genomic DNA content in 80 milk samples. Three replicates were performed for each milk sample, and the average was used as the reference value. And the reference values were used to establish the model of DNA content in NIR analysis.

NIR Spectra Acquisition and Pre-processing
The acquisition of NIR spectra was performed by a near infrared spectrometer (Foss NIR Systems Inc., Germany, Wolfsburg) equipped with a transmission measurement accessory, and the detector is InGaAs. All spectra data processing and calibration model establishment were performed in the WinISI III analysis software (Ver. 1.50e, Foss NIR Systems Inc., Germany, Wolfsburg).
The spectrum of each milk sample was the average of 32 scans within a spectral region of 570 nm to 1850 nm with 2 nm intervals. The spectrometer was equipped with Si (570-1100 nm) and PbS (1100-1850 nm) detector. Room temperature and relative humidity fluctuated at 20-25 o C and 46%, respectively. The absorbance spectra was transformed mathematically by the standard normal variate and detrending (SNV+D), the standard normal variate only (SNV+O) and the detrending only (Detrend +O) procedures, and was transformed with zero derivative, first derivative and second derivative processing (gap=1, 4, 8; smoothing=1, 4, 8; second smooth=1, 4, 8) [26] . Calibration was performed using modified partial least square (MPLS), partial least square (PLS) and principal component regression (PCR) available in WinISI III. Spectral preprocessing and quantitative correction processing were designed with 6 factors and 3 levels using the L 27 (3 13 ) orthogonal table (Table 1).

Construction of the NIR Quantitative Model
Multivariate calibration models were used widely in spectroscopy analysis as statistical tools [27] , while the objective function was used to evaluate the superiority of the models. The equation of object function was showed as follows.
where RSQ is the multiple coefficient of determination, SECV is the standard error of cross-validation.
The performance of the models which were 27 treatment groups (Table 3) were described by the following statistics: standard error of calibration (SEC), standard error of prediction (SEP), standard error of cross-validation (SECV), the coefficient of determination (R 2 ) estimated by prediction models, ratio of performance to deviation (RPD), explaining by which factor the prediction accuracy has been increased compared to using the mean composition for all samples. The RPD was computed in order to interpret the prediction ability of each model, which is the ratio between the SD of the reference method against that of SEP [28][29][30] . These performance parameters were defined by Figure 1, respectively.

Reference Analysis
Bovine genomic DNA content in milk was analyzed with the ultra-low volume spectrometer (Ultra low volume spectrometer, Bio-Chrom, Cambridge, UK). And all measurements were made in duplicate. The average genomic DNA content of the 80 mild samples was 0.0476 ± 0.0243 (SD) μg/μL, with a minimum value of 0.0219 μg/μL, and a maximum value of 0.1954 μg/μL.
The success of the NIR models for DNA content seems strongly dependent on the correlation to DNA content. Meanwhile, DNA extraction method is not difficult to meet the demand for DNA quality detection and DNA content for which basically reached DNA content [22].

Continuum Removal Analysis
To improve the calibration performance, NIR spectroscopy and data pretreatments were used in this study. Excluding 6 outliers, 74 samples were used to build and validate the calibration models. Samples were divided into two groups: 62 samples for calibration and 12 for random prediction. Descriptive statistics of DNA content in both calibration and validation sets were shown in Table 2, including number, average, minimum and maximum values and SD.
Ideally, the calibration samples should fully cover the variability of samples. However, too many samples could increase model errors, and too few samples could fail to cover all the key characteristic information of the samples. Therefore, a careful selection of a calibration set is essential for successful model construction. One of the key points in constructing the universal NIR models is the rational selection of a calibration set from large numbers of complex samples to effectively cover the sample variability. The outliers with a large residual (T value > 2.5 or H value > 10) were removed, and the elimination of outliers was very critical because of errors caused by the change of the experimental conditions, the nature of the samples and the personal measurement [31] .

Spectral Features Analysis
The original spectra of the 74 samples were shown in Figure 2 as log 1/R spectra, which had clear characteristic absorption peaks. In the original spectra, the prominent absorption bands around 1 280 and 1 670 nm were attributed to DNA absorption (Fig. 2). There was a narrow focus of the spectra. In addition, the calibration and validation sets were also well-distributed in the range of different concentrations, and basically met the requirements for establishing the near infrared spectrum model.  The calibration is done to develop the model, and predicts the later unknown genomic DNA content in milk with spectral data from new samples. The best calibration model of DNA content in milk could be preliminarily determined according to the objective function. In order to evaluate the optimal model, the accuracy of it was described by RSQ, SEC and SECV. The R values sequenced as A 3 >F 1 >C 3 >E 1 >B 3 >D 2 (Table 3), which meant that genomic DNA content in milk was most affected by the multivariate calibration methods, followed by the secondary smoothing parameter, and least affected by the derivative processing interval points. Most treatment groups had an F value smaller than 50.0%, except for 20th group which had an F value over 90.0% (Table 3). The optimal model was obtained using MPLS, Detrend +O procedures and second derivative processing (gap = 4; smoothing = 1, second smooth = 1). According to calibration results above, the NIR spectrograms processed by the detrending only, the second derivative processing, smoothing and secondary smoothing were showed in Figure 3.

Calibration Results and Analyzing Appraise
The optimal model had a RSQ = 0.913, SEC = 0.26% and SECV = 0.69%, RPD = 3.57, respectively (Fig. 4). Previous studies [32][33][34][35] indicated that the predicted value would be consistent with actual value when RSQ value is close to 1, SEC and SECV value are close to 0, and RPD values below 1.5 indicate that the calibration is not useful. When the value for the RPD is higher than 2, quantitative predictions are possible. Thus the performance of the optimal model was good. 13.66 F1 ＊ F value % was the objective function values of bovine genomic DNA yield in milk. k1, k2, k3 were the mean of corresponding level of each factor respectively. R was range, the difference in value between the highest (maximum) and the lowest (minimum) observation, R=kmax-kmin.

Prediction Effect of Gradually Multivariate Linear Regression
When the optimal calibration models was applied to predict the independent prediction samples, it showed good performance with R 2 = 0.9846 and SEP = 0.28% (Fig. 4, 5). Good prediction results for DNA content in this study might have been obtained by the spectral preprocessing method (SPM) and informative variable selection method (IVS) which was used for the multivariate calibration models. SPM and IVS enhanced the most relevant information and the effective wavelengths in the spectral data, which made the calibration model more powerful and robust.
It could be seen from Figure 5 that there was relatively small deviation between the regression model predicted values and the reference values. To further verify the optimal model, bovine genomic DNA from a few new milk samples were extracted, and DNA content were determined by the ultra-low volume spectrometer and the optimal model, respectively. DNA content were close with the two methods (the data was not shown), so the optimal model was established successfully and suitable for detecting DNA content.

Conclusions
In summary, this study was the first attempt to apply NIR spectroscopy for determination of bovine genomic DNA content from milk. Since the optimal model from this study could accurately predict DNA content in milk, DNA extraction is not necessary. Thus dairy monitoring could be more economical and fast. DNA could be tested together with protein, fat, lactose and other substances in one. However, Future work is needed in order to extend the results of this study to other DNA content detecting technologies. And more research should be done in order to try to increase the variability or reduce the error of the other reference method. This study would be valuable in meeting with the increasing demands of monitoring dairy food chain. Additionally, the dairy industry can benefit from this internal traceability that, for example, lowers the rates of recalls.