Confronto tra diversi metodi di classificazione per l’autenticazione di campioni di origano mediante imaging iperspettrale nel vicino infrarosso

Authentication of food products is increasingly performed with rapid and non-destructive techniques such as near-infrared hyperspectral imaging (NIR-HSI), paired with multivariate classification algorithms. SIMCA (Soft Independent Modelling of Class Analogies) is one of the most common algorithms used for this purpose. However, it may perform poorly when the target (i.e., authentic products) and non-target (i.e., non-authentic products) classes exhibit significant overlap. Conversely, Soft PLS-DA (Soft Partial Least Squares Discriminant Analysis) is a “hybrid” classification algorithm, capable of maximizing the differences between target and non-target classes, while still allowing for samples to not be assigned to any of the modelled classes. Furthermore, unlike SIMCA models, Soft PLS-DA models can be updated towards the identification of new types of non-target samples. The aim of this thesis is to compare the performance of different classification algorithms, applied to the authentication of dried ground oregano. For this purpose, 49 samples of pure oregano, pure adulterants (myrtle, olive, strawberry tree and sumac leaves) and mixtures of oregano and adulterants were analyzed by means of NIR-HSI in the 980 nm – 1660 nm wavelength range, acquiring a total of 147 hyperspectral images. Kennard-Stone algorithm was used to build a dataset of spectra belonging to oregano and adulterants classes. Then, the spectra belonging to one of the four adulterants were set aside and used as an “unknown adulterant”, while the remaining spectra were divided into a training set and a test set. The split was reiterated so that the spectra of each adulterant served as unknown adulterant once. This was done to evaluate whether the models could correctly reject adulterants not included in the model’s training phase. For each split, the training set was used to calculate classification models at both 95% and 99.9% confidence levels. Soft PLS-DA and three variants of SIMCA (Sim-SIMCA, Alt-SIMCA and DD-SIMCA), each differing in how they define class boundaries, were tested. Furthermore, SIMCA models were developed following both a compliant approach, which maximizes the efficiency, and a rigorous approach, which optimizes only sensitivity. The 56 classification models that were calculated were then applied to the test set and to the spectra of the unknown adulterant, and, subsequently, to all the 147 hyperspectral images. For each model, results obtained from the prediction images were used to set a classification threshold, corresponding to the minimum percentage of pixels predicted as oregano (PPO%) on the images of pure oregano belonging to the training set. Thus, images with a PPO% above the threshold were classified as “Oregano”, while those below the threshold were classified as “Adulterated”. The results indicate a detection limit of oregano adulteration by NIR-HSI approximately equal to 10%. The performance assessment of the considered classification algorithms through ANOVA and PCA has shown that, at the 95% confidence level, Soft PLS-DA is the most efficient algorithm in prediction of the test set, while Soft PLS-DA and DD-SIMCA compliant perform better on hyperspectral images. On the other hand, at the 99.9% confidence level, Soft PLS-DA is the most efficient algorithm on both the test set and the images. These findings suggest that Soft PLS-DA is a more effective alternative than SIMCA in identifying oregano adulterations.

L’autenticazione dei prodotti alimentari si basa sempre più spesso su tecniche rapide e non distruttive come l’imaging iperspettrale nel vicino infrarosso (NIR-HSI), abbinato ad algoritmi di classificazione multivariata. L’algoritmo più comunemente impiegato è SIMCA (Soft Independent Modelling of Class Analogies), che, però, può dare prestazioni insoddisfacenti quando la classe dei prodotti autentici (target) e le classi dei prodotti non autentici (non-target) sono molto sovrapposte. Soft PLS-DA (Soft Partial Least Squares Discriminant Analysis), invece, è un algoritmo di classificazione “ibrido”, in grado di massimizzare le differenze tra classe target e classi non-target prevedendo, al contempo, l’eventualità che un campione non venga assegnato a nessuna di esse. Inoltre, a differenza dei modelli SIMCA, i modelli Soft PLS-DA possono essere aggiornati per identificare nuove tipologie di campioni non-target. Lo scopo di questa tesi di laurea è il confronto tra le prestazioni di diversi algoritmi di classificazione, applicati all’autenticazione di origano essiccato e tritato. A tal fine, 49 campioni di origano puro, adulteranti puri (foglie di mirto, olivo, corbezzolo e sommacco) e miscele di origano e adulteranti sono stati analizzati mediante NIR-HSI nell’intervallo di lunghezze d’onda compreso tra 980 nm e 1660 nm, acquisendo un totale di 147 immagini iperspettrali. L’algoritmo Kennard-Stone è stato utilizzato per costruire un dataset di spettri rappresentativi della classe origano e della classe adulteranti, che comprende i quattro adulteranti considerati. In seguito, gli spettri di uno dei quattro adulteranti sono stati esclusi dal dataset e usati come “adulterante ignoto”, mentre i rimanenti sono stati suddivisi in training set e test set. La suddivisione è stata reiterata in modo che gli spettri di ogni adulterante fungessero da adulterante ignoto una volta, al fine di valutare la capacità degli algoritmi di classificare un adulterante non incluso nella fase di modellizzazione. Per ogni suddivisione, il training set è stato impiegato per calcolare i modelli di classificazione considerando due livelli di confidenza (95% e 99.9%). Gli algoritmi testati sono stati Soft PLS-DA e tre varianti di SIMCA (Sim-SIMCA, Alt-SIMCA e DD-SIMCA), ciascuna delle quali definisce diversamente i confini della classe modellata. Inoltre, i modelli SIMCA sono stati calcolati seguendo i due approcci compliant, che massimizza l’efficienza, e rigorous, che ottimizza solo la sensibilità. I 56 modelli di classificazione calcolati sono stati applicati al test set e all’adulterante ignoto, e in un secondo momento a tutte le 147 immagini iperspettrali acquisite. A partire dalle corrispondenti immagini in predizione, per ogni modello è stata fissata una soglia di classificazione pari alla minima percentuale di pixel predetti come origano (PPO%) sulle immagini dei campioni di origano puro del training set. Le immagini con una PPO% superiore alla soglia sono state classificate come “Origano”, quelle con una PPO% inferiore come “Adulterati”. I risultati indicano un limite di rilevabilità delle adulterazioni dell’origano mediante NIR-HSI intorno al 10%. Il confronto delle prestazioni dei diversi algoritmi di classificazione effettuato tramite ANOVA e PCA ha dimostrato che, al 95% di confidenza, Soft PLS-DA è l’algoritmo più efficiente in predizione del test set, mentre Soft PLS-DA e DD-SIMCA compliant hanno portato ai risultati migliori sulle immagini iperspettrali. Al 99.9% di confidenza, invece, Soft PLS-DA è l’algoritmo migliore sia sul test set che sulle immagini iperspettrali. Questi risultati suggeriscono che Soft PLS-DA rappresenta un’alternativa più efficace rispetto a SIMCA nell’identificazione di adulterazioni dell’origano.