Application of Joint Generalized Linear Models in Determining Physical Support Factors that Influence Crop Yield in Northern Ghana

Increase in crop yield in many parts of Africa is largely the result of increase in cultivated land. This trend, if allowed to remain, will increase the already high levels of forest depletion. This study attempts to formulate a model useful in examining support systems that influence crop yield in Northern Ghana. Comparison of the Classical Generalized linear model to the Joint Generalized linear models and selection of the very best factors that influence crop yield based on the best of the two models are the points of interest for this study. Data from the regional Monitoring and evaluation office of the linking farmers to market (FtM) project in Tamale Ghana was analysed and discussed. Crop type, Financial Credit, Training, Study tour, Demonstrative Practical, Networking Event, Post-harvest Equipment, number of farmers in the FBO and Size of plot cultivated were our measured fixed effects variables with Total Crop Yield as our response. We settle on the Joint GLM for inference and selects access to credit facility, Crop type, Networking among farmer groups, access to equipment used in post-harvest, the number of farmers on site and size of plot as the most important physical support factors that influence crop yield in Northern Ghana. Stakeholders in the Food and Agricultural sector are advised to give these listed factors the needed attention in the midst of resource scarcity and our quest to increasing yield while minimizing the conversion of our forest lands into farm lands.


Introduction
Kraybill et.al asserts that for the greater number of staple crops in sub-Saharan Africa, productivity is declining, and that if there were any gains in productivity, it was mainly due to expansion in cultivated land (Kraybill, et.al 2009). As a result of the expansion in cultivated lands, most sub-Saharan African nations are experiencing high levels of forest depletion. According to Bennett and Cattle (2013a), land management practices of farmers has been positive and that effective information sharing between farmers and extension service workers has always proved to be a vital contribution to this positive achievement. In many developing countries, low agricultural productivity has been due to the absence of improved agricultural technologies.
Results from many studies reveal that Crop type, Financial Credit, Training, Study tour, Demonstrative Practical, Networking Event, Post-harvest Equipment, number of farmers in the FBO and Size of plot planted remain key contributors to crop yield (Amare, Asfaw, et.al 2016;Asfaw et al. 2011). Whereas modern farming techniques such as the use of improved seedlings, organic fertilizers, and advanced farm irrigation methods have positively improved crop yield in many parts of the globe, many of our farmers in Africa have not been up to speed with the adoption of such methods of farming. The excuse for not utilizing such technologies is mainly due to lack of information in respect of the application of these methods of farming (Morris et.al, 2007). On the other hand, in the instances where modern improved seedlings and other agricultural inputs are misapplied, the traditional crop may still have higher yield than the modern ones and this in many cases scares farmers from adopting new farming technologies. Access to information through mass media or governments extension workers is therefore very essential to increasing agricultural productivity. Plot size remains a key factor in farm-based discussions such that, farmers with larger farms are more likely to adopt improved agricultural technologies than those with smaller farms (Kassie, Shiferaw, and Muricho 2011;Mariano, Villano, and Fleming 2012).
The role of information sharing in the promotion of new farming technologies among farmers cannot be underrated. Extension service enhances such adoption of modern farming technologies and ultimately increases crop yield (Azikiwe et al. 2013). In many parts of Africa, agricultural extension services have helped farmers in expanding the production and adoption of new technologies aimed at increasing productivity (Dejene 1989;Gautam 2000), and same has been the case in the northern regions of Ghana. Aside from the listed factors under study, education on alternative farming practices in respect of new technologies will greatly improve yield (Ghimire and Huang 2015). Unfortunately, information on such new technologies hardly gets to our rural farmers.
Knowledge on the impact of newly developed farm technologies on smallholders' agricultural production and income remains limited (Ghimire and Huang 2016). The relationship between agricultural extension and productivity has been investigated in previous studies by many researchers. For example , Birkhaeuser et.al (1991) in a systematic review of 26 research outcomes concluded that there exists a strong positive relationship between contact extension and productivity. According to Evenson (1997), as a result of high variation in program design and skills of extension workers, it may not be appropriate to draw extreme conclusions concerning the contributions of agricultural extensions to economic growth.
Again Suvedi and Kaplowitz 2016, Lopokoiyit et al. 2012and Suvedi and Ghimire 2015 all emphasized some two main reasons why extension services as a variable must be excluded in estimation of agricultural productivity. Their first point has to do with the fact that many previous works that used extension services as a variable could not account for knowledge spill over as information move from one farmer to another. Hence, if a farmer hasn't been educated by an extension worker but gets vital assistance from friends which led to an increase in productivity, such increment would not be as a result of extension and its inclusion in the productivity model would bias it. The second challenge in the use of a farm-level extension workers as a variable is the existence of likely endogenous interaction between the farmer and extension worker.
That is, some unobservable qualities like "desire for the best farming methods may be portrayed by a lot of productive farmers, which may result in an edge to seeking for the services of extension workers (Owens et.al, 2003), controlling biases due to the extension variable by including variables plot size, plot location, and farmer strength into the productivity model. Government and other extension service providers have reached other areas that seem to be more accessible and include resource-rich farmers or elite groups in the communities (FAO 2010). Physical support factors and inputs such as fertilizers, improved seed, access to credit, and irrigation are scars for farmers living in areas like Northern Ghana (Suvedi and McNamara 2012).
One key essence of statistical models is its limited dependence on field data and its ability to assess uncertainties that surrounds model variables. If any statistical model performs poorly in representing a dependent variable's response to an independent variable, we can clearly detect it by its low coefficient of determination (R 2 ) as well as large confidence intervals of coefficients. The current study demonstrates one of such strengths of statistical models in predicting yield response to some nine (9) main factors for nearly 790 sites in the three Northern regions of Ghana. Such a study is very relevant in our country Ghana since Agriculture is our main backbone. The traditional generalized linear models (GLMs), were obtained from classical linear models by two extensions, one to the random part and one to the systematic part. By these extensions therefore, random elements are now allowed to belong to a one-parameter exponential family including the normal distribution. Since its inception, generalized linear models (GLMs) have been used as technique for analyzing various data types. Model checking is usually based on examination of the model diagnostic residuals, similar to the linear model case, except that in the case of GLM's, standardization of residuals is required and a little difficult.
In practice, even though the GLM is widely noted for its good performance in modelling, some natural discrepancies are likely to arise. Whereas observations with large discrepancies on y-axis are known as outliers which exist between data and the fitted values generated by GLMs fall into two main classes; isolated or systematic (Lee, Nelder and Pawatan 2006). When few observations have large residuals, isolated discrepancies are seen. Such residuals can occur if the observations are wrong. An example is when 12 is recorded as 21. Data based robust methods are sometimes used in studies to handle such cases. However, such robust methods are unable to identify the triggers of the discrepancies.
An alternative is to model isolated discrepancies as being caused by variation in the dispersion and to seek covariates that may account for them. This technique of joint modelling of mean and dispersion known as Joint Generalized Linear Models (Lee and Nelder, 2003c) makes such exploration straightforward. Furthermore, if a covariate can be found to be the cause of any discrepancies, then we obtain a model-based solution which can be altered in the future by policy makers in the relevant sector. The study therefore, compare results from the traditional generalized linear models to that of the joint generalized model to justify that Joint Generalized Linear models are appropriate for examining physical support factors that

Data and Data Source
Data for this study was acquired from the Monitoring and Evaluation office of the Linking Farmers to Markets (FtM) project in Tamale -Ghana. 800 Maize and Soybean farmer based organizations (FBOs) were engaged and interviewed with the help of a structured questionnaire. This was later cleaned to 790 distinct observations. Farming Communities were selected as follows; three (3) farming communities each from the Upper East and West regions and seven (7) from the Northern Region. Crop type, Financial Credit, Training, Study tour, Demonstrative Practicals, Networking Events, Post-harvest Equipment, number of farmers in the FBO and Size of plot cultivated were our measured fixed effects variables with Total Crop Yield as our response. The regions and the specific communities are addressed as random effects. R (dhglmfit) package Statistical Analysis software is used throughout the analysis. Models were fitted for the traditional as well as the joint generalized linear models.

Generalized Linear Models
The Gaussian generalized linear model used in this study consists of three components: 1. A random component, which identifies the conditional distribution of the dependent variable given the independent variables. 2. A linear function of the regressors, called the linear predictor, on which the expected value or the mean ( ) of depends.

An invertible
( ) = , which converts the anticipation of the response to the linear predictor. The inverse of the link function is sometimes called the mean or expected value function: −1 ( ) = .

Joint Generalized Linear Models
The method used in this paper follows the Joint linear models of Nelder (1998 and2003c). Where Φ = ( ). The ML estimator for is ( Φ −1 ) −1 Φ −1 and the variance of the estimator is ( Φ −1 ) −1 . Now suppose that we have a regression model for the dispersion ( ) = where (. ) is a link function and is the th row in a specific design matrix : The ML estimate of the regression coefficient of the dispersion can be computed by using ̂2 as response in a Gamma GLM with mean . If a log link is used it ensures that the estimated � is positive. The REML estimate can also be computed by using Here, is a vector of hat values and is the diagonal element ( ) in the hat matrix = ( Φ −1 ) −1 Φ −1 . The estimates of and Φ need to be estimated iteratively, because the estimate of depends on Φ , and Φ � depends on the estimated residuals.
We have two interlinking models in relation to their method; one for the expected value and the other for the dispersion depending on same y observed data and the deviance : Where is the link function of the dispersion model, the model matrix in the dispersion model denoted as , which is a GLM with a gamma variance function and is the model matrix in the mean ( ) model. In the joint GLM, dispersion parameters are no longer considered to be a constant or unvarying, but can change with the mean parameters. What it means is that the dispersion values are required in the Iterative weighted least squares (IWLS) algorithm for calculating the regression parameters.    To begin with, the raw data is plotted and the patterns of Crop yield against some selected covariates are observed. The observed scatter plot of crop yield against plot size is displayed in Figure 1, Figure 2 shows the observed scatter plot of the crop yield number of Farmers, and Figure 3 also presents the observed scatter plot of the crop yield against Regions while figure 4 presents the observed scatter plot of the crop yield against the 13 communities.

Modelling
From the parameter estimates (Table 1), the fitted traditional Gaussian GLM selected the following linear determinants as those that significantly influence crop yield in Northern Ghana; access to credit, crop type, access to Training programs, the number of farmers and plot size. On the part of the Joint GLM, significant linear determinants of crop yield in Northern Ghana included access to credit, Crop type, Networking among farmer groups, access to post-harvest equipment, number of farmers, and Plot size. Form the results, both methods are unanimous with the selection of access to credit, Crop type, number of farmers and Plot size. These four factors therefore stand out as the most important and consistent determinants of crop yield in Northern Ghana. The next significant set of factors would be access to Training programmes (as suggested by traditional GLM) and Networking among farmer groups as well as access to post-harvest equipment (as suggested by joint GLM).
However, before applying the distributional results for inference, it is always necessary to check that the model meets its assumptions well enough to be sure the results are likely to be valid. Figure 5 shows the model-checking plots for the traditional Gaussian model. From this figure, the diagnostic plots have several satisfactory features. The running mean in the plot of residuals against fitted values shows some marked trends, and the plots of absolute residuals appears relatively unstable. The normal plots show some minimal discrepancy. However, the histogram of the residuals seems almost symmetric. These are moderately sufficient indication of an appropriate model in the face of the nature of raw data from the field. Figure 6 shows the model-checking plots for the Joint Gaussian model. From that figure, we see an improved model fitness with a lot of satisfactory features. The running mean in the plot of residuals against fitted values shows no form of marked trend, and the plots of absolute residuals appears to have a relatively stable slop. The normal plots show no discrepancy and the histogram of the residuals is symmetric. These are very good indications of an appropriate model hence our choice of the joint GLM for inference.

128
Application of Joint Generalized Linear Models in Determining Physical Support Factors that Influence Crop Yield in Northern Ghana    Table 2 below supports revelations on the model diagnostics that even though the traditional GLM was quite a satisfactory mean model, modelling both mean and dispersion (Joint-GLM) improves the quality of the model diagnostics significantly. In all three criteria for best model selection; Akaike information criterion (AIC), Baysian Information criterion (BIC) and the conditional Akaike information criterion (cAIC), the Joint-GLM performs far better compared to their counterpart GLM. The primary condition for decision is that, the best model is the one with the least criteria values. This study therefore, settles on the Joint GLM for inference and selects access to credit, Crop type, Networking among farmer groups, access to post-harvest equipment, the number of farmers and Plot size as the most important physical support factor that influence crop yield in Northern Ghana. Therefore, as a country, if we wish to increase crop yield by physically supporting our farmer groups, then we have to do all we can to ensure that the above factors are given the needed attention.

Residuals vs Fitt
From the dispersion model in Table 1, it is observed that relying on the Joint GLM in crop yield modelling, we record a possible dispersion (Prediction error) of 14.917 (Intercept of dispersion model). Another important information from the dispersion model of the Joint GLM is its ability to reveal the contribution of each physical support factors to increasing or decreasing the dispersion. The significance is that, once a variable is found to account for a discrepancy, then we achieve a model-based solution to the question of which variables should be completely ignored by policy makers regarding the nature of discrepancy they introduce. For instance, we observe that of the nine (9) variables used in this study; all but two (post-harvest equipment, the number of farmers) introduces significant discrepancies to the accuracy of the crop yield model.

Conclusions
We conclude that, even though the traditional GLM was a satisfactory mean model, modelling the mean and dispersion (Joint-GLM) improves the quality of the models significantly. We strongly recommend this technique of joint modelling of mean and dispersion as a means of improving the quality of all forms of models that fall under the general class of generalized linear model and its extensions. We recommend that stakeholders give the needed attention to our selected physical support factors.
We admit that but for the unavailability of data, as frequently the case in many parts of our world, extensive input data on farm management practices, soil condition, climate and other non-physical contributors to yield would have enriched our models. We hereby suggest further research that would consider these non-physical contributors to yield.