Abstract:In order to construct a high-quality recognition model for the six major tea categories, this study selected 370 samples and collected their near-infrared spectroscopy(NIRS). A rapid recognition model for the six major tea categories was developed by combined these data with spectral pre-processing, feature extraction and data mining classifier algorithms. The results indicated that: 1) Support vector machine(SVM) and random forest(RF) classifiers were both suitable for constructing rapid identification models for the six tea categories. 2) The SVM classifier was more suitable for modeling with the original spectrum(OS), and pre-processing algorithms tended to weaken the discriminatory performance of the models based on this classifier. 3) The RF algorithm was more suitable for modeling with pre-processing spectra, and the resulting models had a significant improvement in recognition accuracy (RA) and area under the curve (AUC) of the receiver operating characteristic curve compared to the OS models. 4) Among the feature extraction algorithms, the linear discriminant analysis(LDA) algorithm performed the best, yielding models with significantly improved RA compared to OS models. The optimal model, OS-LDA-SVM, achieved RA of 100.00% and AUC of 1.00, demonstrating high recognition rate, strong generalization ability, excellent model performance, and potential in industrial application. In summary, NIRS combined with pre-processing, feature extraction algorithms and classifiers to build models for the identification of the six tea categories was highly feasible. The models have high recognition accuracy and excellent performance, providing scientific, accurate, and efficient technical support for the rapid identification of tea categories in the tea trade, which could lay the foundation for the industrial application of international tea category identification models.