Effects of Reducing the Length of the Questionnaire by Multiple Matrix Sampling on the Validity of Structural Equation Modeling for Factors Affecting Job Morale

The research was designed to examine the effects of question setting using different conditions into 10 sets on the validity of structural equation modeling for factors affecting job morale. The data was collected from 690 personnel working in regional Statistical Offices around Thailand by using cluster random sampling. The tool used in collecting data was the questionnaire with 95 items and 5 levels of rating scale. The discrimination value was between 0.244 and 0.860 and the reliability: α was from 0.699 to 0.900. Data analysis was conducted by the use of descriptive statistics and structural equation modeling by Mplus 7.4 program. The findings showed that the structural equation modeling with question setting in 3 sets of sub-questionnaires, cooperated with item sampling without replacement and non-fixing core items, conformed to the empirical data the most and that overall relationship between the structural model parameter and the model parameter from the complete questionnaire was high in a positive way.


Introduction
Structural Equation Modeling or SEM is one of the widely popular statistical techniques used in both behavioral sciences and social sciences [52] due to its two distinguished characteristics. One is that it can be used to analyze the model with latent variable (the variable which is studied with abstract characteristics and cannot be completely measured as it needs the measurement index called "observed variable") and analyze for the effect at the same time [33]. Another is that it has relaxed assumptions [11] as well as an opportunity for model modification provided to the researcher in case of incomplete conformation of the model the empirical data [56]. With these two distinguished characteristics; SEM is widely accepted and better applied in current research. This results in obtaining the findings from the research with different forms of problems that frequently revealed the measurement errors in the studied variables, such as Multi level structural equation modeling (MSEM) ( [4]; [49]), Multi-group analysis [10], Growth models analysis ([16]; [60]), Multitrait-Multimethod analysis (MTMM) [5], and Meta-Analytic Structural Equation Modeling (MASEM) ( [40]; [55]), etc. Although SEM provides reliability of the findings, some of its limitations are still found in the model using a great number of both latent and observed variables.
As SEM is suitably applied in the research analysis which explains the social phenomenon representing complex involvement and complicated relationship, a number of latent variables are needed for developing SEM. Each of those is necessary to be measured by more than 3 observed variables [6]. Also, because there are several measurement scopes of each observed variable, the tool consisting of many questions for model validation is required. When the tool is utilized in collecting data, it is always discovered that the participants spend a lot of time answering the questions which results in fatigue from dealing with a number of the items ( [12]; [61]), less participation ( [17]; [67]), and more deviation of collected data [58]. Some scholars began to notice the problem and examine the effects of using questionnaire to collect data for the research in the fields of social sciences. One major piece of evidence was figured out when [66] studied the effects of meta-analysis from using e-mail questionnaire. The research showed that the rate of answering the questionnaire significantly decreased if the length of the questionnaire was more than 4 pages. Moreover, [65], [24], and [26] examined the effects of questionnaire length and found that the ratio, sending out and returning, of the shorter questionnaire was higher than the longer one. For [39], the team studied the effects of the length of questionnaire on the participation and found that the length resulted to data provision with the statistical significance at .01, so it could be concluded that the length of the questionnaire was one principal factor affecting the measurement variation, besides the limitations regarding incomplete variable measurement of the research in social sciences. Once the conclusion became more explicit, [57] proposed the solution to the length of the tool in order to reduce the problem by the technique called "Multiple Matrix Sampling: MMS." This tool was applied to deal with the fault in providing data and measurement variation and to improve the data quality [15].
According to Shoemaker's concept, several educational institutions appreciated the benefits of MMS. Various forms of international measurement and assessment were applied which were; 1) Trends in International Mathematics and Science Study: TMISS (42]; [45]), 2) Programme for International Student Assessment: PISA which focuses on assessing the students' capability in applying knowledge and life skills and members in the project are currently from more than 70 countries worldwide [48], 3) Progress in International Reading Literacy Study: PIRLS [42], and 4) National Assessment of Education Progress: NAEP [44]. In addition to the national sectors, there are some other organizations applying MMS, such as economic and political organizations as well as public health sectors [19]. However, there is no empirical evidence that obviously confirms the application of MMS in the social science research, especially the application in measuring SEM affected from the length of the questionnaire.

Multiple Matrix Sampling
MMS was originally mentioned in the beginning of 1950 by Turnbull, Ebel and Lord, the researchers of the Education Testing Service, who proposed the way to solve the problems found in the educational assessment by using MMS. Later, Lord, Hooke, and Tukey developed statistical procedures to estimate the population's parameters which MMS was applied [57]. After that, [57] studied this concept and created the textbook about MMS which statistical methodology, estimation, hypothesis testing, and guidelines of MMS application in collecting data were included in the content and widely used afterwards.
The principle of MMS is that items in the complete questionnaire are divided into sub-questionnaires [50] and each of the sub-questionnaires is managed to be in use for collecting data from each group of the responders [59]. The two main rules are to be considered. The first rule is the principle in indicating the number of the item set which [57] explained that it should be considered from; 1) the number of the sub-questionnaires (t) (no higher than the fewest items used in measuring the variables of the conceptual framework), 2) the number of responders answering the sub-questionnaires (n) (considered from total number of the responders (N) divided by the number of the sub-questionnaires (t)), and 3) the number of items in the sub-questionnaires (k) (considered from total number of items (I) divided by the number of the sub-questionnaires (t)). For example, there are 300 items (I=300) in the complete questionnaire and the number of the fewest items in the variable is 3 when there are 900 participants (N=900). The researcher can design the division of the sub-questionnaires into two options. That means the sub-questionnaires can be divided into 3 sets of 100 items (t=3, k=100) and there are 300 responders for each set, or the sub-questionnaires are divided into 2 sets of 150 items (t=2, k=150) and there are 450 responders for each set (n=450), respectively. The second rule is the principle of item setting which each set of the sub-questionnaires can be arranged in two options. The first option is the use of core items [64] and the second option is item sampling with replacement which refers to repeated items or item sampling without replacement which refers to randomized items [9]. Practically, in case that the number of items in each sub-questionnaire is not equal after dividing the sub-questionnaires, there are two steps to deal with the fraction of the items; the set number of the sub-questionnaires is firstly sampled, and then the items are sampled into the sub-questionnaires (in case of item sampling with replacement) or the rest of the items is managed into the set (in case of item sampling without replacement). For instance, there are 5 items for measuring the variable and the researcher divides the sub-questionnaires into 2 sets and uses item sampling without replacement. Consequently, each set contains different numbers of items including one of 2 items and the other of 3 items, and then the researcher manages with the item no. 3 by sampling the set of sub-questionnaires before adding it into the selected one.

The Hypothetical Model
In this research, the studied variable is job morale in the organization as it is one important factor that helps drive the organization. Any organization with the personnel possessing greater job morale is likely to result in successful and effective work performance because its personnel are active and intentionally perform their duties to achieve the goal. On the other hand, the personnel would be inactive, hopeless, discouraged, and less motivated resulting in poor work performance if the personnel lacked the job morale. According to the literature reviews, it was revealed that there were several scholars proposing the theories related to job morale improvement. For example, [37] proposed his theory explaining five levels in the hierarchy of needs within an individual which include physiological needs, safety needs, social needs, self-esteem, and self-actualization. [23] expressed that maintenance or hygiene factors and motivation factors were essential to  [34] also mentioned about basic factors influencing job morale. [38] stated that there were 3 needs within an individual which were generated from learning in society, culture, and environment and developed into those three needs of each individual. [63] viewed that an individual developed his self-confidence from 4 elements comprising self-awareness, consideration, satisfaction, and action after decision and [18] divided 10 elements influencing job satisfaction which included job security, job advancement, management satisfaction, wage and salary, job description, command or supervision, communication, work environment, and other fringe benefits. It is clear that all concepts are related to constructing job morale and it has been used as the foundation in various fields of studying human behaviors.
In synthesizing the variables from the previous approaches, 5 latent variables are revealed which are 1) Job Morale: JMO (state of mind and emotions affecting job concentration)measured from 4 observed variables including Company Policy and Administration: CPA, Administration: ADS, Superiors Subordinates Peeves: SSP, and Security: SEC, 2) Job Motivation: MOT (factors stimulating or leading individuals to an attempt to be energetic to achieve the job targets) measured from 5 observed variables including Energetic: EGT, Job Security: JSE, Esteem: EST, Egoistic Needs: EGN, and Self-actualization: SAC, 3) Self Confidence: SCD (the personnel's ability to perform their duties to achieve the job targets) measured from 5 observed variables including Emotional Stability: EMS, Courage: COU, Self-Reliance: SRT, Autonomy: AUT, Adaptability: ADP, 4) Job Satisfaction of officers: SFO (positive emotions or attitude towards the job) measured from 6 observed variables including Achievement: ACH, Control Over Work Itself: CWI, Responsibility: RES, Advancement: ADV, Job Cognition: JCO, and Job Action Tendency: JAT, and the last latent variable is Work Environment: WEN (things and conditions around the employees) measured from 5 observed variables including Readiness Factor: REF, Interpersonal relations: INR, Atmosphere environment: TEN, Management: MAN, and Social and Fringe benefits: SFB, respectively. However, these variables are so abstract that it could not define behaviors that directly represent job morale, or that the amount could not be calculated. The researcher, therefore, had an idea to develop SEM of factors affecting job morale by applying MMS in question setting with different conditions and study the results against the harmonized index. Hence, details of the results, showing the link between the variables according to the hypothesis and the data sources supporting the relationships of the variables in SEM, are explained in the form of hypothesis model as shown in Figure 1.

Research Design
The researcher aimed to study the forms of item sampling and examine the effects of using each data set from 10 sets of questions in different conditions into on the conformity of structural equation modeling and empirical data between the model using the data from complete questionnaire and that from sub-questionnaire. The previous findings by [23], [36], [28], and [54] enabled the researcher to expect that 3 sets of questions provided better harmonized index than that from 2 sets, and that the index than that from complete questionnaire, respectively. The reason was that the complete questionnaire was divided into several sub-questionnaires resulting in fewer items per set, better cooperation in answering the questions from the responders, and better harmonized index having the questionnaire with more items per set. For the approach by [9], the researcher expected that item sampling without replacement provided better harmonized index than sampling with replacement.

Research Sample
According to the hypothesis model, there were 63 parameters to be estimated in SEM when 10 samples were fixed for 1 parameter ( [20]; [29]) and 630 responders were needed. After data collection, it was found that there were totally 690 responders and that the number was higher than the estimation and was enough for parameter estimation. The responders were consisted of 149 government servants (21%), 336 government employees (49%), and 205 mission-based employees (30%).

Research Instrument and Procedure
The tool used in this research was the questionnaire containing 95 items with 5 levels of rating scale and covering the measurement of hypothesis model consisting of 5 latent variables. Each of the questions was examined for content validity by the expert before the try-out on 100 government servants, government employees, and mission-based employees working in provincial statistic offices who were not in the sampled group. The aim was to calculate discrimination values by Item-total Correlation (r xy ) and reliability (α) using Cronbach's Alpha Coefficient. When examining the tool quality, it revealed that each of the latent and observed variables contained high level of confidence at 0.699 -0.900, especially the variables-JMO, MOT, SCD, and SFO which possessed similar values. However, it was noticed that WEN was the variable with a little lower value than others. The highest number of items used to measure the latent variables was 22 (for SFO) and the lowest number of items was 17 (for JMO and SCD), whereas the lowest number of items used to measure the observed variables was found in 6 variables as shown in Table 1. The personnel provided the data by considering each of the questions and mark in the box to express their level of agreement/ (from 5 = the most to 1 = the least). The length of time used for collecting data lasted 2 weeks and 100% of receiving data from the questionnaire.

Data Analysis
The researcher recorded all results from the complete questionnaire into the computer and calculated the average of each observed variable as shown in Table 1 consequently, there were 25 values found (from 25 observed variables) and were used in the analysis in order to examine data features by descriptive statistics and consider the relationship between the observed variables Pearson Product Moment Correlation Coefficient (as shown in Table 6.) before conducting an analysis for answering the research goal using the program Mplus 7.4 [43].
In the analysis to answer the research goal, the researcher divided the process into two stages as follows.
In the first stage, it was comparison of the competitive model that used the data from 10 models of sub-questionnaires. The df values of models were all equal in this stage, so the researcher only considered the Chi-Square difference when the model with the lowest Chi-Square was the best [3] in the group under conditions. After that, the model was adjusted and analyzed to define the harmonized index in the next stage.
In the second stage, it was model modification for which the researcher selected and the original model that used the data from complete questionnaire. The researcher adjusted the model until it reached the criteria of consistency by considering the index used in examining the goodness of fit statistics which were Relative chi-square or degree of freedom containing the value less than 2 [56] and the p-value representing statistical insignificance at .05 (p < .05). Additionally, it was necessary to consider Comparative Fit Index (CFI) and Tucker-Lewis Index (TLI) which should not be higher than 0.90 (if it was higher than 0.95, it showed very good conformity), whereas the index of Root Mean Square Error of Approximation (RMSEA) and the index of Standardized Root Mean Square Residual (SRMR) should be lower than 0.08 (if it was higher than 0.05, it showed very good conformity) [30]. In the estimation of parameter value, the maximum likelihood was applied because the researcher observed that the data was shown in normal distribution [56].

Result of Question Setting
From the previously mentioned principle of MMS, when considering all 95 items according to the conceptual framework (indicated as the original model), the researcher could design the set of questions into 10 models based on two reasons. The first reason was that the complete questionnaire used in data collection contained 6 observed variables and used 3 questions to measure the variables. If the researcher sampled the items without replacement, it was able to arrange the items in the maximum number of 3 sets and the items in each set were not repeated. The second reason was that the researcher considered the use of core items which either 1 item or 2 items was possible. However, 3 core items could not be used because some observed variables were measured with the fewest items of 3. As a result, if the 3 core items were used, all 3 sets of sub-questionnaires would contain the same items. Criterion for item setting was in 10 possible models as shown in Table 2 and item setting based on the criterion was shown in Table 7.
Practically, the researcher noticed that there were some items in model no.2, model no.4, model no.6, and model no.8, were not sampled in the item sampling to measure the variables with replacement. On the other hand, the sampling without replacement revealed that all items were sampled to measure the variables in the models. Some variables were measured with 4 items, but there were 3 sets of sub-questionnaires, such as model no. 3, had 1 item as fraction which was later sampled to be included in the sub-questionnaires together with other items under the same definitions in measuring variables, and then calculated the average in the process of data management.
According to the number of items in each sub-questionnaire, it was discovered that the number of items in some models was unequal, such as model no.1, model no.2, model no.3, model no.4, model no.7, model no.8, and model no.9; however, the researcher also noticed that , overall, the α value of the sub questionnaire in each model was similar.

Effects of Item Setting on the Consistency of the Structural Equation Modeling
3.2.1. Comparison of the Harmonized Index of Item Setting Based on Different Criteria According to the results of examining the validity of SEM by considering the harmonized indices between the use of the data from complete questionnaire and the use of data from sub-questionnaires, overall harmonized indices of both original model and competitive model revealed that all 11 models did not meet the standard. For the original model, χ 2 was 3213.276 (with statistical significance), df value was 268, and the relative chi-square was 11.990. For the competitive model, χ 2 was in a range from 1778.289 to 2696.872 (without statistical significance), df value was 268, RMSEA was between 0.090 and 0.115, CFI was between 0.670 and 0.750, TLI was between 0.630 and 0.721, and SRMR was between 0.138 and 0.190. Moreover, the researcher noticed that the index of relative chi-square for each model was obviously distinguished which the value was between 6.635 and 10.063. This showed that each model had different goodness of fit statistics as shown in Table 3, as for Pre-modification indices.
In addition, the researcher had an observation by comparing the relative chi-square values of the models from item setting between 2 sets and 3 sets obtained from 5 pairs of the models which were model no.  Table  3. while Post-modification indices and the primary data of the model measurement were shown in the Appendix.
Moreover, the researcher considered the detailed information of the competitive model parameters between the original model and the model no.3 after the model modification. It was figured out that the factor loading of all 25 observed variables in both models was positive and different from zero at the statistical significance at 0.01. Also, the observation on the observed variables in the original model showed that the factor loading values of 10 variables were higher while those of 10 variables were lower than those of the model no.3, respectively as shown in Figure 2.  The consideration regarding the relationship of factors affecting job morale in the original model showed the factor, which directly influenced job morale and differed from zero in a positive way with the statistical significance at 0.01, was work environment (WEN) with the effect value of 0.607, except job motivation (MOT), self-confidence (SCD), and job satisfaction (SFO), which had no direct effect. Besides the direct effect estimation, the consideration of the total effect (TE) from the sum between direct effect (DE) and indirect effect (IE) found that the factor affecting job morale most was work environment (WEN) due to a higher standardized total effect than other factors (β= 0.806) which the ratio was three times more likely to be direct effect than indirect effect. Additionally, the following factor was job motivation (MOT) which the ratio between direct effect and indirect effect was nearly the same, whereas self-confidence (SCD) had only direct effect and job satisfaction (SFO) was likely to be negative in case of direct effect, but positive in case of indirect effect.
For the model no.3, the factor which directly influenced job morale was work environment (WEN) as same as that of the original model because it was only one factor with the effect different from zero in a positive way with the statistical significance at 0.01, whereas job motivation (MOT) showed the effect different from zero in a positive way with the statistical significance at 0.05 which the ratio was twice more likely to be direct effect than indirect effect. Additionally, self-confidence (SCD) had only direct effect in a negative way and job satisfaction (SFO) was likely to be negative in case of indirect effect, but obviously positive in case of direct effect.
Besides, the researcher noticed the effect value in standardized score of the original model and the model no.3 that the effect value of job satisfaction in the original model was negative while that in the model no3 was positive. For the indirect, the researcher obviously found a difference of job motivation in both models that the effect value of the original model was different from zero in a positive way with the statistical significance at .05, whereas the effect value of the model no.3 was not different from zero. Meanwhile, work environment showed several observations that the effect value of the original model was not different from zero while that of the model no.3 was different from zero in a positive way with the statistical significance at .05. For the total effect, there was a difference in the effect value of self-confidence that the effect value in the original model was positive when that in the model no.3 was negative.  In addition, when considering the coefficient of determination from the analyses of both models, it was found that the model no.3 showed 0.730 while the original model showed 0.680 which was a little lower than that of the model no.3. When the result from the analysis of each parameter in the model was calculated for the relative chi-square by Pearson's Product Moment correlation: r xy , it resulted in 0.800, showing that both models provided the estimation result into the same direction at a rather high level and the details of effect value estimation were shown in Table 4.

Conclusions and Recommendations
The findings revealed that 3 sets of question setting provided better harmonized index than that in 2 sets of item sampling, whereas 2 sets of item sampling provided better index of consistency than that in the complete questionnaire. These findings conformed to the hypothesis and the researcher expected that there were two important causes to be explained.
The first cause is that question setting in 3 sets of sub-questionnaires reduced the number of items per set as well as a decrease in number of pages which were fewer than the question setting in 2 sets of sub-questionnaires; meanwhile, the setting in 2 sets of sub-questionnaires reduced the number of items per set as well as a decrease in number of pages which were fewer than that of the complete questionnaire, respectively. When it was used in collecting data, it helped the responders pay more attention to answering the questions, and reduce boredom, fatigue, responder's burden [59], the variation caused by answering too many items in the questionnaire [57]. The discoveries by [66] confirmed that the rate of responding to the questionnaire might significantly reduce when the length of the questionnaire was more than 4 pages and it might result to the quality of the collected data ( [27]; [32]).
Additionally, several researchers who studied the effect of the response rate mentioned that the effect value might reduce if a very long questionnaire was used in data collecting ([1]; [35]; [2]). Meanwhile, several researchers compared the rates of response between the use of long and short questionnaires and found that the responders prefer the shorter questionnaire to the longer one ( [7]; [8]; [26]; [31]. Consequently, it was suggested that the questionnaire should be divided into more than 2 sets if the researchers required the use of a lot of questions in collecting data according to the advice proposed by [23].
The second cause is that item sampling with and without replacement had an effect on the item setting. In other words, the sub-questionnaires sampled with replacement might result in sampling the repeated item and question redundancy and it was found that the sub-questionnaires lacked content validity for each of the variables due to some missing items. The researcher considered that this was a severe problem because content validity was an important factor, which the researcher should consider when measuring SEM since all items in variable measurement had been already filtered based on the definitions. In that case, the researcher was unable to deal with the problem by adding or replacing the missing item with the items used in measuring other variables because they were used in measuring different variables. Therefore, the researcher proposed the use of item sampling without replacement in measuring SEM which conformed to the findings of the empirical research. This indicated that the model using the item sampling without replacement consistently provided a better harmonized index than that of the item sampling with replacement. This also conformed to the findings of [51] who examined the effect of item sampling on the parameter estimation and they found that the item sampling without replacement caused less variation of parameter estimation than that of sampling with replacement and that the use of core set of questions also affected the item setting. The findings from this research showed that the model using 1 core item provided a better index value than the model using 2 core items. This conformed to the proposal of [64] who mentioned that, in each component of each sub-questionnaire, there should be at least 1 core item that the responders had to answer altogether. However, the comparison of the index values, between the model using no core item and the model using core item, showed that the model using no core item provided a better harmonized index. It might be from the reason that the use of core item led to an increase in the number of questions per set which conformed to the research of [58] who conducted a study on the effect caused by the questionnaire with a lot of questions and the result was measured in response rate showing that the rate reduced when there were too many questions.
Although the researcher adjusted the model until the harmonized index met the criteria, there was an observation that some parameters of both models were still negative and the cause might possibly be from low reliability. When the researcher arranged the questions based on the criteria, it affected the balance in distributing questions and the average in measuring each sub-questionnaire. The findings required the researcher to be careful in indicating model specification which was a very important step or it could be considered "the heart or key" of SEM analysis because it was a process related to theories, research, and information technology used in developing the models and examining the quality of the tool before collecting and analyzing the data in order to make it the most accurate in measuring the model. However, all parameters had the harmony at 80% when the rest of 20% might be caused by the variation in measuring the variables due the use of many variables in SEM. These findings conformed to the research by [59] indicating that the confidence and predictive validity of both the complete questionnaire and the shortened questionnaire were similar.
In the viewpoints of the researcher regarding the question setting by MMS, it seemed that the method was another option that the researcher could select to be used in setting the questions in order to make it more convenient and quicker when collecting data as well as generating the responders' motivation. However, the approach used for MMS was not clear enough as there were no theorists mentioning the principles of question setting about a proper number of sets or core items used and the distribution of effect values resulted from item sampling with or without replacement. If the researcher applied the method, it was necessary to indicate the form of question setting suitable for the data in order to obtain accurate research result and to make it more reliable afterwards. To generate the tool, the researcher could use negative questions distributed in the end of the questionnaire in order to examine the willingness and concentration in providing the data by considering all the questions. There might be additional study on the effect of gender on the variables as well as the study on the effect of the way of life among people living in each region to find out if it affected the variables or not because each region had its own different social and cultural contexts to be used in others, such as multi-group analysis, factor analysis, multi-level analysis, or even growth models analysis owing to the fact that obtaining the hypothesis model for these analyses needed to be thoroughly synthesized based on the theories as same as SEM.