Mathematics and Statistics Vol. 10(2), pp. 419 - 430
DOI: 10.13189/ms.2022.100217
Reprint (PDF) (292Kb)


Principal Canonical Correlation Analysis with Missing Data in Small Samples


Toru Ogura 1,*, Shin-ichi Tsukada 2
1 Clinical Research Support Center, Mie University Hospital, Mie, Japan
2 Department of Education, Meisei University, Tokyo, Japan

ABSTRACT

Missing data occur in various fields, such as clinical trials and social science. Canonical correlation analysis often used to analyze the correlation between two random vectors, cannot be performed on a dataset with missing data. Canonical correlation coefficients (CCCs) can also be calculated from a covariance matrix. When the covariance matrix can be estimated by excluding (complete-case and available-case analyses) or imputing (multivariate imputation by chained equations, k-nearest neighbor (kNN), and iterative robust model-based imputation) missing data, CCCs are estimated from this covariance matrix. CCCs have bias even with all observation data. Usually, estimated CCCs are even larger than population CCCs when a covariance matrix estimated from a dataset with missing data is used. The purpose is to bring the CCCs estimated from the dataset with missing data close to the population CCCs. The procedure involves three steps. First, principal component analysis is performed on the covariance matrix from the dataset with missing data to obtain the eigenvectors. Second, the covariance matrix is transformed using first to fourth eigenvectors. Finally, the CCCs are calculated from the transformed covariance matrix. CCCs derived using with this procedure are called the principal CCCs (PCCCs), and simulation studies and numerical examples confirmed the effectiveness of the PCCCs estimated from the dataset with missing data. There were many cases in the simulation results where the bias and root-mean-squared error of the PCCC estimated from the missing data based on kNN were the smallest. In the numerical example results, the first PCCC estimated from the missing data based on kNN is close to the first CCC estimated from the dataset comprising all observation data when the correlation between two vectors is low. Therefore, PCCCs based on kNN were recommended.

KEYWORDS
Canonical Correlation Analysis, Missing Data, Principal Component Analysis, Small Samples

Cite This Paper in IEEE or APA Citation Styles
(a). IEEE Format:
[1] Toru Ogura , Shin-ichi Tsukada , "Principal Canonical Correlation Analysis with Missing Data in Small Samples," Mathematics and Statistics, Vol. 10, No. 2, pp. 419 - 430, 2022. DOI: 10.13189/ms.2022.100217.

(b). APA Format:
Toru Ogura , Shin-ichi Tsukada (2022). Principal Canonical Correlation Analysis with Missing Data in Small Samples. Mathematics and Statistics, 10(2), 419 - 430. DOI: 10.13189/ms.2022.100217.