Development and Validation of the ‘ iCAN ! ’-A Self-administered Questionnaire Measuring Outcomes / Competences and Professionalism of Medical Graduates

The Tuning-Medicine Project produced a set of ‘level one’ and ‘level two’ learning outcomes/ competences to be met by European medical graduates. In the learner-centered era self-assessment becomes more and more important. Our aim was to develop a self-completion questionnaire (‘iCAN!’) evaluating graduates’ learning outcomes. The Tuning ‘level two’ learning outcomes were transformed to a 104 closed items questionnaire, tested with a convenient sample of 512 graduates from the seven Greek medical schools during the 2009–2011 graduation cohorts, 21 practitioners, and seven different undergraduate student groups. Cronbach alpha, factor analysis, and mean score comparisons were used to check internal consistency, construct validity, and sensitivity respectively. Question mean scores were used to differentiate weak and strength areas of competence. Among graduates, all overall alphas were >0.95 and all subscale alphas higher than expected, indicating an at least acceptable internal consistency. Factor analysis produced one factor per ‘level one outcome’, except for ‘practical procedures’ which was split in two meaningful factors. Overall mean score was 44.4%, 52.2%, 61.2%, 73.4%, 81.4% among 2-year, 4-year, 6-year students, graduates, and practitioners respectively (p<0.001); improvement through progressively matured groups suggest good sensitivity, indicating also good responsiveness. Among graduates, question mean scores ranged from very weak (blood transfusion, 39%) to very strength (measuring blood pressure, 95%), indicating good differentiation of strengths and weaknesses. A consistent, well-structured, and sensitive version of a questionnaire is hereby released for graduates’ abilities and professionalism self-assessment and differentiation of strengths and weaknesses that could be used for informed SWOT policy.

In any school's curriculum assessment, one should listen to the students [28] who are perhaps the hardest judges [29].Though relevant attempts have been published [24,30,31] no tool exists for measuring the end-product of a medical Development and Validation of the 'iCAN!' -A Self-administered Questionnaire Measuring Outcomes/ Competences and Professionalism of Medical Graduates curriculum.The need for measuring the educational environment during the period of undergraduate or postgraduate medical studies has led to a series of well-validated measures, where trainees' self-perceptions are used for measuring the educational environment [32][33][34][35][36][37].
Based on our experience of translating them into the Greek language [38,39] we constructed the 'iCAN!' instrument, a self-completion questionnaire revealing graduates' perceptions on what they are able to do upon graduation [40].This study's objective is to present the development and validation process.

Learning Outcomes for Undergraduate Medical
Curriculum: The Content of the 'iCAN!' Tool Learning outcomes are usually specified with a hierarchy of levels, with a top level consisting of large domains of learning, within each of which subsidiary outcomes are described with increasing levels of granularity [7].The European Core Curriculum was structured in nine domains ('level one' learning outcomes), each of which was comprised of four to twenty 'level two' learning outcomes, 92 in total [26].For example, the first 'level two' learning outcome of the first 'level one' learning outcome states "Graduates should be able to take a detailed and relevant history".The Tuning Learning Outcomes were structured in domains (sixteen 'level one' learning outcomes, twelve for medical abilities and four for professionalism), each of which was comprised of up to thirteen 'level two' learning outcomes, 95 in total (69 for medical competences and 26 for professionalism).For example, the first 'level two' learning outcome of the first 'level one' learning outcome states "Graduates in medicine will have the ability to take a history" (Table 1).
Based mainly on the 'Tuning Learning Outcomes' [2] and supplemented by some important ones from the European Core Curriculum [26] the 'iCAN!' questionnaire was developed, by transforming the Tuning 'level two' learning outcomes in a self-assessment tool.This instrument could be considered as a step further of the work done by the MEDINE group.The development of the tool followed three phases, each ending in a version of the questionnaire: the piloted initial version by undergraduate students, the tested version (v0) with graduates, and the hereby released final version (v1).Each version was built upon the experience of the previous one.

Development of the 'iCAN!' Tool: The Initial Version
Firstly, the 'Tuning Learning Outcomes' were translated into the Greek language [41].Then, according to the rules "each Tuning 'level two' learning outcome should be transformed into a single question" and "the whole tool should be kept as short as possible", an initial version of the questionnaire was created, taking into consideration "a medical graduate completing it immediately after graduation" as a third rule.
There were 95 Tuning 'level two' learning outcomes; thus, the questionnaire should consist of 95 questions.However, since several Tuning 'level two' learning outcomes were bipartite, e.g."subcutaneous and intramuscular injection" (one could be competent to the one but not to the other), the total items of the questionnaire increased.All bipartite 'level two' learning outcomes were split in two separate questions.In addition, the eleventh Tuning 'level one' learning outcome "Ability to apply scientific principles, method and knowledge to medical practice and research", not expanded to 'level two' learning outcomes, was also split in two separate questions, one relating to practice and one to research.On the other hand, the two 'level two' learning outcomes 16.1 "appreciation of diversity and multiculturality" and 16.2 "understanding of cultures and customs of other countries" were considered too similar and thus combined in one question, while the 'level two' learning outcome 3.5 "advanced life support" was considered not suitable for the undergraduate level and thus omitted.Finally, three European Core Curriculum outcomes (diagnostic and therapeutic options available within other healthcare professions; STEEEP principle; non-verbal communication), not matching any Tuning learning outcome, were considered very important, therefore three more questions were added.In conclusion, a 122 items questionnaire was created (the initial version), accompanied by an open question ("Comments").Since graduates are asked whether they are competent in all these outcomes, all questions started with the verb "I can", hence the name of the questionnaire; No negative question was formulated.
The initial version of the questionnaire was piloted in 15 third-year, 14 fifth-year and 18 sixth-year medical students at the Ioannina University, Greece.They were also asked whether they perceived the questions to be conceivable and/or duplicated.In regard to their input, the questions were restricted to 104 and subsequently randomly re-ordered using Microsoft Excel function 'rand()'.Though answering a long questionnaire is not an easy task, no other questions were eliminated in order to include all Tuning 'level two' learning outcomes.A last question (q105) that was raised during the first phase was added to investigate whether graduates attributed their abilities to their school's curriculum; it is worth to note that this is not an outcome question but an educational process question and its interpretation should be similar to educational environment measures such as DREEM, PHEEM etc. Finally, following Whittle et al [42], the open question of the initial version was replaced by "If you could change one thing in your school, what would this be?" [43].
Respondents were asked to choose one of six response options in a Likert scale (strongly agree, agree, rather agree, rather disagree, disagree, strongly disagree), without allowing uncertainty in the middle that causes loss of information [38].In addition, offering participants six instead of five options, it is closer to the ideal 'five to nine' options, congruent with both respondent preferences and Universal Journal of Educational Research 2(1): 19-36, 2014 21 reliability statistics, and at least somehow averts the 'central tendency bias' that results in loss of reliability and sensitivity.[44] Thus, the version 'iCAN!v0' was created to be tested with a group of graduates, measuring their perceptions upon their medical abilities and professionalism (Table 1).
Table 1.The 'Tuning learning outcomes' and the related questions of the 'iCAN!v0' tool used for validation.Mean scores (MS; %) obtained from its administration to Greek medical graduates (n=508) and interpretation of the scores in color (see note 3) Tuning Learning Outcomes (desired competences) 1  The iCAN!v0 questionnaire 2 MS 3 Graduates in medicine will have the ability to: Total Questionnaire 1 Carry out a consultation with a patient 1.1 take a history 65 I can take a correct history.
1.2 carry out physical examination 1 I can carry out a systematic physical examination.
1.3 make clinical judgments and decisions 94 I can make clinical judgments and take clinical decisions.
1.4 provide explanation and advice 12 I can provide explanation and advice to the patient.2.5 provide care of the dying and their families 39 I can provide care of the dying and their families.
2.6 manage chronic illness 8 I can manage chronic illness.

be aware of additional diagnostic and therapeutic options available
within other healthcare professions (ECC p13) 4   18 I am aware of the additional diagnostic and therapeutic options available within other healthcare professions.
2.8 uphold the STEEEP principle of patient care (ECC p10) 4  19 I can keep the STEEEP patient care principle (safe, timely, effective, equitable, efficient, patient-centered).12 Promote health, engage with population health issues and work effectively in a health care system 12.1 provide patient care which minimizes the risk of harm to patients 2 I can provide to patient care which minimizes the risk of harm them.
12.2 apply measures to prevent the spread of infection 14 I can apply measures to prevent the spread of infection.
12.3 recognize own health needs and ensure own health does not interfere with professional responsibilities 4   27 I can recognize my own health needs.
44 I can ensure own health does not interfere with professional responsibilities.
12.4 conform with professional regulation and certification to practice 26 I can conform with professional regulation and certification to practice.
12.5 receive and provide professional appraisal 38 I can receive and provide professional appraisal.
12.6 make informed career choices 60 I can make informed career choices.
12.7 engage in health promotion at individual and population levels 79 I can engage in health promotion at individual and population level.
13 Professional attributes 13.1 probity, honesty, ethical commitment 56 I can commit by probity, honesty and morality & ethics.
13.2 commitment to maintaining good practice, concern for quality 28 I can devote to maintaining good practice and concerning for quality.
13.3 critical and self-critical abilities, reflective practice 47 I have critical and self-critical abilities.
13.5 creativity 24 I can be creative.
13.6 initiative, will to succeed 4  36 I can take initiatives.
7 I have the will to succeed.16.4 knowledge of a second language 73 I have knowledge of a second language.
16.5 general knowledge outside medicine 59 I have general knowledge outside medicine.
-105 I owe my medical knowledge and abilities to my school's curriculum. 5 Development and Validation of the 'iCAN!' -A Self-administered Questionnaire Measuring Outcomes/ Competences and Professionalism of Medical Graduates considered similar and combined in one question (92), c) the outcome that was considered inappropriate for undergraduate level and was omitted (3.5), and d) the three outcomes that were added to the questionnaire from the European Core Curriculum (ECC) and assigned the outcome identification numbers 2.7, 2.8, and 6.11. 5 This is not an outcome question, this is an educational process question raised during development (see note 3).

Testing the 'iCAN!' Tool with Graduates
The 'iCAN!v0' was first transformed by a private agency (Anova Consulting, www.anova.gr) to an anonymous scanable form (available upon request).Then a survey among Greek medical graduates was carried out, approved by the Deans of all Greek medical schools (see Acknowledgements).The scanable forms were distributed to a convenient sample in all seven medical schools, during the 2009-2011 graduation periods.Completed forms were scanned by the Anova Consulting, using a high reliability optical mark recognition scanner (OpScan iNSIGHT™, Pearson NCS) and the QuickTesting software.Thus, an electronic data file was obtained, which was checked against the original completed questionnaires by both the Anova Consulting and the authors.
According to SF-36 scoring instructions [48], response options were assigned scores in the six equal interval Likert scale 0-100 as follows: strongly disagree = 0, disagree = 20, rather disagree = 40, rather agree = 60, agree = 80, strongly agree = 100, and mean scores were calculated (instead of sums as in DREEM [33] and PHEEM [36]).This 'standard scoring method' [46] has the advantage that all absolute scores (overall, subscale or single question ones) coincide with their corresponding percentages, since both range from 0 to 100, the easiest scale for everybody to remember, interpret and compare.
Since the Tuning learning outcomes were set and described in two granularity levels ('level one' and 'level two') by consensus of a large qualitative group of teaching staff from over ninety European schools through the whole MEDINE process rather than by statistical methods [2], we considered more appropriate to base our analysis on 'level one' in order to test the properties of each Tuning 'level one' learning outcome and to identify whether their 'level two' learning outcomes compose a single factor (domain) or can be best described with a two or more factors model.
Cronbach's alpha and 'alpha if item deleted' were calculated to check internal consistency within each of the sixteen Tuning 'level one' learning outcomes.Cronbach's alpha was also calculated for various splits of the total questionnaire, i.e., by type of outcome (medical ability, professionalism), medical school (for each of the seven Greek medical schools), gender (male, female, not declared), and time to complete the questionnaire (the fast and the slow ones: the less and the more than the mean time to complete).The Spearman-Brown formula was used to estimate expected split alphas [45][46] to compare with the directly calculated ones.
Exploratory factor analysis was used to analyze interrelations among the questions within each of the sixteen Tuning 'level one' learning outcomes (construct validity).Seven questions (13, 16, 27, 35, 41, 49, 73) were transformed before performing factor analysis since they did not satisfy normality [47] checked by skewness and kurtosis.The extraction method was Maximum Likelihood, and the rotation method Varimax with Kaiser Normalization.Factors accepted were those fulfilling the Schönrock-Adema et al [47] principles: scree plot point of inflexion; eigenvalues >1.5; proportion of variance accounted for an approximate additional 5% of the variance; a given factor should contain at least three variables with significant (>0.40) loadings; variables loading on the same factor share the same conceptual meaning; variables loading on different components appear to measure different constructs.
In order to test the instrument's ability to detect really existing differences among different groups, the total questionnaire mean scores among undergraduates, graduates, and practitioners were compared.The underlying hypothesis was that the higher the medical maturity is the higher the overall mean score must be.For this reason, in addition to graduates, seven groups of 1st to 6th year undergraduate medical students at the Ioannina University and a group of practicing physicians three to thirty years after graduation (median=10, Q 1 =5.5, Q 3 =15.5)were also included.ANOVA was first used to check whether mean scores differ, and then unpaired two-tailed t-test with unequal variances was used to locate the differences between the groups.Parametric statistics can be used with Likert data, with small sample sizes, with unequal variances, and with non-normal distributions, without the fear to come to the wrong conclusion [66].
Analyses were performed using Microsoft Excel 2003 and SPSS 16.0.Reported p-values were considered significant at p<0.05.Responsiveness (changes within a group over time) was beyond the scope of this study.

'Ican!V1': The Released Version of the Questionnaire
Questions that seemed to be problematic in the previous testing phase, i.e. questions that if omitted lead to increased alphas or questions loaded less than 0.40 in a factor or loaded to two factors, were thoroughly re-examined.In addition, through the experience of the whole testing phase, we found many opportunities for a more penetrating glance at all questions enabling further refinement.Thus, in order to avoid misunderstandings and to be as clear as possible, we accepted periphrastic wording, including even definitions or glossies.For example, "I can take an appropriate history, detailed and relevant" instead of simply "I can take a history", or "I can make evidence based medical decisions, unbiased by any conflict of interest" instead of "I can make decisions": everybody makes decisions all the time, but we are not interested in graduate's personal life, only about medical decisions, more precisely whether they are evidence-based and unbiased from any conflict of interest, direct or indirect, obvious or hidden; this is what we are really interested in, and in this way the participant is clear of what he is being asked.
Under the umbrella rule not to exclude any Tuning 'level two' learning outcome, outcome 3.5 (advanced life support) and 16.2 (understanding other countries), not included in the tested version (v0), were re-incorporated to the released version (v1): It is MEDINE's responsibility to eliminate Tuning learning outcomes.Since outcome 3.5 covers entirely outcome 3.4 (basic life support), the corresponding to outcome 3.4 question could be omitted.However, since some graduates might be able in outcome 3.4 but not in 3.5, it was decided to keep both.Outcomes 3.4-3.6ask furthermore if the graduates comply with the European guidelines.However, not the legislation, European, American, Greek etc., is here meant, but the evidence-based and scientific guidelines; thus we replaced "European" with "scientific (evidence-based)".
Tuning 'level two' learning outcome 1.3 states "Graduates in medicine will have the ability to make clinical judgments and decisions".However, everybody makes judgments and decisions, but making the right clinical judgment and evidence-based decision is what we actually are interested in when asking the medical graduate.These two words make all the difference and the real underlying meaning of this outcome is that the "right person does the right thing right" [6].Therefore we rephrased the statement "I can make clinical judgments and take clinical decisions" to "I can make right clinical judgments and evidence-based medical decisions" in the released version.
The Tuning 'level one' learning outcome 6 states "Graduates in medicine will have the ability to communicate effectively in a medical context".The critical word here is "effectively", but it was not included in version v0.We had to insert it in all corresponding questions in the released version.
Regarding the Tuning 'level two' learning outcome 1.1 "to take a history", we are interested in to learn from the medical graduate whether he has the ability to take an appropriate history, as formulated in the Mexican learning outcome ("The ability to take an appropriate history") [15], since everyone (any healthcare professional, the patient, the relatives, even the lay people) can "take a history".Thus, we formulated the correspondent statement as "I can take an appropriate history, detailed and relevant" in the released version 'iCAN!v1'.
Two additional questions ("learn from own mistakes" and "bear in mind the consequences of own decisions") from the European Core Curriculum [26] were considered important professionalism attributes, especially the first one [54][55][56][57][58][59][60] and that is why they were included in 'iCAN!v1'.The same applies for the last but one question, aiming at the overall self-assessment of the graduate.
Finally, based on graduates declaration attributing their knowledge and abilities to medical school's curriculum by about 60% [61] we added the first open question for the remaining 40% to clarify where they attribute their knowledge and abilities if not to the curriculum.
Furthermore we realized that the open question "If you could change one thing in your school, what would this be?" was more suitable for an educational environment metric rather than for an educational outcome metric [43] and thus we replaced it with two open questions "What medical competence, which you are not competent at, you would like to be competent?"and "what medical competence, which you are competent at, you would not bother if you were not competent at?"Both questions could also serve for future content revisions of medical learning outcomes.
It took them 15 to 20 minutes to complete the questionnaire (mean 18, standard deviation 10; median 15, interquartile range 10 to 20).
Three of the 512 questionnaires were empty except the open question that was answered, and one questionnaire was answered with the option 'strongly agree' in all closed questions.These four questionnaires were excluded.Six more questionnaires had overlooked page 3 for technical reasons (the unusual top-right stitching in order to keep clear the left edge of the form for the scanner to work correctly), resulting in 174 missing values.In addition, there were 371 missing values (0.7%, ranged 0% in q1 to 1.6% in q100) scattered throughout the dataset without any obvious underlying pattern.For factor analysis, these 545 (174+371) missing values, 1.03% of the whole dataset, much less than 20% [67], were replaced with the variable mean [47].The 52,391 non-missing values were distributed to strongly agree 25%, agree 36%, rather agree 26%, rather disagree 9%, disagree 3%, strongly disagree 1%.In only 8 of the 104 questions (Table 1), the 'strongly agree' option counted for more than 50% of all six options: q49 'measure blood pressure' 83%, q35 'electrocardiography' 69%, q7 'will to succeed' 65%, q41 'confidentiality' 64%, q16 'venepuncture' 59%, q73 'knowledge of a second language' 59%, q56 'probity, honesty, ethical commitment' 54%, and q17 'ability to recognize own limits and ask for help' 51%.

Internal Consistency
Cronbach's alphas are given in Table 2.In the first scale, Development and Validation of the 'iCAN!' -A Self-administered Questionnaire Measuring Outcomes/ Competences and Professionalism of Medical Graduates for example, "Consultation with a patient", consisting of six questions, from the 508 questionnaires 491 had no missing values, giving an acceptable alpha of .765.Five of the sixteen scale alphas were good (>.8), eight acceptable (>.7), and only three questionable (>.6).'Alpha if item deleted' was slightly higher in five scales (5, 7, 9, 14 and 15), in the second decimal place and beyond, that could reasonably be considered negligible (see notes a-e).Alphas by type of outcome, medical school, gender, and time to complete the questionnaire were >.9, as expected by the Spearman -Brown formula, in part due to the high number of questions.

Factor Analysis
All corrected item total correlations were above .200(between .214and .706).Thus, all questions were kept for the factor analysis, a synopsis of which is presented in Table 3.For the first scale for example, "Consultation with a patient", a medical ability (m) Tuning 'level one' learning outcome (TL1LO), all its six questions (12, 94,…, 1)were loaded to only one factor with loadings from .719maximum (q12) to .636minimum (q1), eigenvalue 2.8, percent of variance explained 45.9, and percentage of the 'Consultation with a patient' explained by these six questions 98.9%.In summary, fifteen of the sixteen TL1LO were explained by only one factor each, and only the fifth was split in two factors.The fifteen factors accounted for 39% to 80% of the variance of each TL1LO, with eigenvalues between 1.6 and 4.4, and loadings between .401 and .893,with the only exception for the question 17 that was slightly lower, .344(see note 2 in Table 3): This solution was factorially simple (all questions of each TL1LO were loaded on one factor) and interpretable (the same meaning as the original TL1LO).The fifth TL1LO was split in two factors, with eleven questions the first factor (eigenvalue 4.3, variance explained 35.3%, loadings between .470 and .745),and five the second (eigenvalue 2.4, variance explained 13.3%, loadings between .455 and .761),two of which were loaded on both (>.400):This solution is also acceptable; the second factor contains the 4 or 5 most common or simple or taught or known practical procedures, and the first the remaining 10 or 9 less common/ simple/ taught/ known ones.
In general, all factors met the Schönrock-Adema et al [47] criteria described in Methods, with few marginal exceptions: Only one of the seventeenth factors, the 11 th , contained less than three questions, due to that the eleventh TL1LO contained only two questions (q100, q90).Only one of the 104 questions loaded slightly lower than .4(q17 "Ability to recognize own limits and ask for help", .344).Two questions (q55 "Intramuscular injection", rather common; q69 "Subcutaneous injections", rather un-common) loaded >.4 in both the fifth TL1LO factors, which, though, could be interpreted as the less and the more common practical procedures (Table 3).At any rate, these questions should be considered carefully in the next version of the questionnaire.

Sensitivity (Discriminant Validity)
The total questionnaire mean score (TMS) increased from 42.6% in first year students to 81.4% in practitioners (Table 4).The increment from group to group was statistically significant (p<0.001), with the only exception of the two fifth-year student groups where, as expected, no significant difference was observed.The opposite occurred for standard deviation (not shown) that decreased from 33.3 to 20.4.That is, as participants mature, they feel more competent (higher scores) and more self-confident (lower deviation).These results are in line with what could be expected.
Among graduates (Table 1), the scale mean score varied from 64% (prescribing drugs) to 82% (professional attributes), while the question mean score ranged from 39% (blood transfusion) to 95% (measuring blood pressure).It is not known whether scale mean score is in line with what it is expected, but it is very plausible that question mean score is as declared, since every student should be able to measure blood pressure, but no Greek medical curriculum might include blood transfusion in their outcomes.This is an indication that the questionnaire is able to differentiate well between the weak and strong areas.

Final Refinement of the Tool: from the Tested 'Ican!V0' to the Released 'Ican!V1'
Questions that seemed to be problematic in the previous testing phase, i.e., questions that if deleted increased alphas even though by a negligible amount (from .006 to .052 if questions 49, 35, 41, 17, 13 and 53 were deleted) or loaded less than .40 in one factor (only q17) or loaded to two factors (q55, q69), were thoroughly re-examined.In addition, going through the whole testing phase, we experienced many opportunities for a more penetrating glance at all questions that enabled further refinement.
After the validation experience and questionnaire's refinement as described in last subsection of the Material and Methods chapter (2.4), Table 5 presents the hereby released version 'iCAN!v1'.Participants: the number of participants, i.e., the number of suitable completed questionnaires (508 in total, after excluding the three ones that had not answered any closed question and one that had answered 'strongly agree' to all questions).Cases: the number of questionnaires without any missing value, as reported by SPSS; only these were used in alpha calculations.Alpha: observed Cronbach's alpha.Interpretation of the observed alphas: Excellent: >.9, Good: >.  1 Rotated factor matrix, factors 17 (all those with eigenvalues greater than 1.5, and loading suppress .4);Extraction method: Maximum Likelihood; Rotation method TL1LO 5: Varimax with Kaiser Normalization; Rotation converged in less than 25 iterations; Seven items (13,16,27,35,41,49,73) were transformed before performing factor analysis because they did not satisfy normality, checked by skewness and kurtosis. 2Questions' ID (as in Table 1) in descending order by question loading: the first question had the maximum loading, and the last one the minimum.106 questions in total, because q55 and q69 are presented twice, in bold and plain (higher and lower loadings, respectively), due to their loading in both sub-factors of the fifth factor.The TL1LO 14 shows in parentheses the only question (17) with loading <.4 (.344), while the other nine questions of this factor had loadings between .788(q62) and .595(q46).ID: the TL1LO identification number as in Table 1.T: type of the outcome (m = medical ability; p = professionalism; 16 in total, 12m+4p).Q: number of questions per outcome (104 in total).F: number of factors per TL1LO (17 in total, 16+1, due to the fifth TL1LO splitting in two factors).S: scree plot sharp point of inflection after the ith factor (i=2 for the fifth TL1LO, i=1 for all the rest).EV: eigenvalue (between 1.6 and 4.4).%V: percent of variance explained.Q/F: questions per factor (equal to Q, except in TL1LO 5, where two questions were loaded to both factors, see note 2 above).Loading (max, min): loading range from maximum to minimum of the questions of the factor (presented in the previous column: the max loading corresponds to the first question and the min to the last, while the loadings of all other question of the factor lie in between).LRA: linear regression analysis.Adj R2: adjusted R square = the percentage of the TL1LO explained by all its questions that are statistically significantly related with it (all question are highly significantly related with corresponding TL1LO, p<.001).

Discussion
This study's goal was to describe the transformation of the Tuning learning outcomes to a useful questionnaire.Transforming a given outcome to a reasonable question was not an easy task as it first appeared.Thus, the discussion will focus to four interrelated but also independently important features of a questionnaire: reliability, validity, sensitivity, and responsiveness.Finally, the released version will be presented, followed by study's limitations, and the overall conclusion.

Internal Consistency and Reliability
All total questionnaire scale alphas were >0.95, indicating excellent internal consistency [44,46].There are at least three reasons for such high alphas: the correlation of the items on the scale (actual reliability), the length of the scale (the number of questions), and the length of the Likert scale (the number of response options).Response options, five in educational environment metrics (DREEM, PHEEM etc), are six in 'iCAN!' and this adds to reliability [44].The length of the total questionnaire (104 questions) is rather long; however, observed scale alphas were higher than or equal to expected ones, calculated by the Spearman -Brown formula that standardizes by the number of items of the total questionnaire [45,46].Supposing that other than actual reliability reasons cause some increase in Cronbach alpha, such as 15%, again total questionnaire overall alphas would remain good (>0.80); even accepting that much cutting as 25%, they would remain >0.70, indicating an at least acceptable level of internal consistency.Thus, we can conclude that our efforts produced a consistent questionnaire.

Validity
Content validity and face validity both are optimized by a wide range of individuals involved [46].The wide number of people involved in the Tuning-project, through literature searches, taskforce workshops and web-based opinion survey [2], insures content.The wide number of people involved in content's transformation into the 'iCAN!' questionnaire and in pilots insures that this content has been successfully transmitted into the questionnaire.The fact that all six alternative response options in almost all questions were used is in agreement with this conclusion [49].Eleven questions with zero or only one answer in the 'strongly disagree' and 'disagree' options, as well as all questions with the majority of the answers in the rather uncertainty area of 'rather disagree' and 'rather agree', were thoroughly re-examined and appropriately rephrased in hereby released version' iCAN!v1'.
Factor analysis plays a major role in construct validation [46]; in general, both confirmatory and exploratory factor analysis can be used due to study purposes.Because of the MEDINE consensus, we kept in the hereby released version 'iCAN!v1' the two items loaded in both factors in the fifth level one learning outcome and the one loaded slightly lower than 0.4 in the fourteenth 'level one' learning outcome (Table 3).
There is no other 'gold standard' or well-established instrument and this is not a prognostic study; thus, concurrent and prognostic validity remain unchecked.

Sensitivity and Responsiveness
High reliability is usually a prerequisite for sensitivity [46], and this condition was met as discussed before.Furthermore, the score differences found among undergraduates, graduates and practitioners are a good sign of the instrument's ability to detect existing differences among groups (sensitivity).It is reasonable to accept that these differences do really exist.It is also reasonable to accept that the two different groups of fifth year students should not differ (and 'iCAN!' found no difference), and that as medical participants maturate their self-confidence increase (and iCAN! indicated exactly this: the standard deviation decreased).Another good indication of instrument's ability to detect existing abilities is the two extreme scores, the best (measuring blood pressure 95%; any medical student is expected to have been taught and learned this outcome) and the worst (blood transfusion 39%, far below the pass/fail cut-point of 50%; no undergraduate Greek curriculum contains this outcome).A rather unexpected result was the relatively high total mean score for the students of the first and second year of study.This could be at least partially attributed to the potential lower accurateness in the self-assessment the students may have in the first years of study compared to higher year students.Concluding, it is reasonable to accept that the tool has the sensitivity to discriminate strengths and weaknesses, offering thus a basis for a SWOT educational policy [62].
Testing the ability of the developed questionnaire to detect changes over time within the same group (responsiveness) was beyond the scope of this study.Thus, we have no indication of its responsiveness other than its sensitivity, since a highly sensitive scale will usually also be highly responsive [46].It could also be reasonably argued that the changes being found between progressively matured groups (students, graduates, practitioners) carry a flavor of change over time, and this is in line with good responsiveness.As discussed above, the validation exercise failed to reveal a non-reliable, non-valid, non-sensitive and perhaps non-responsive tool.However, a deeper insight in published literature and a more penetrating glance at all questions enabled further refinement, presented in Table 5.The equivalent Greek version is available from the corresponding author upon request.
The translation of the original English Tuning learning outcomes to Greek language was not translated backwards [41].However, 'iCAN!' creation is not a translation of an existing English questionnaire; it is a construction of the Greek 'ΜΠΟΡΩ!' simultaneously with its English counterpart 'iCAN!'.Since the original Tuning learning outcomes are in English and all authors have a good command of English and most of them have studied, lived and worked abroad for rather long periods of time and are publishing in English, we think that 'iCAN!' conveys successfully the meaning of the original outcomes.
All questions have been treated as equally weighted.However, outcomes 2.3 and 2.4 (differential diagnosis, mean score 74%, and treatment/management plan, 62%) are almost the whole medicine, with much more weight than measuring blood pressure (outcome 5.1, mean score 95%) or certifying death (outcome 7.4, mean score 57%).Weighting is a matter mainly of the Tuning project, and future research could answer whether question scores should be weighted.
Sometimes students and especially teachers considered Tuning 'level two' learning outcomes upon which the 'iCAN!' was built as too generic and not medically specific.They had a feeling that the biomedical paradigm that drives today's medicine and medical schools (anatomy, physiology, biochemistry, pathology etc) was not enough represented.Perhaps the knowledge outcomes described in Appendix A of 'The Tuning Project (Medicine)' [2] should be incorporated in the questionnaire.For example, "I can sufficiently to graduation level demonstrate knowledge of the normal function of all the systems of the human body (physiology)" or "…of the human society (sociology)" etc; however this would add about forty more questions.Perhaps also clinical attachments and experiential learning of Appendix B of 'The Tuning Project (Medicine)' should be included, e.g."I had had sufficient to graduation level experienced clinical work in obstetrics and gynecological care"; beyond that they are process rather than outcomes, some fifteen more questions would be added.In the contrary, Tavakol & Dennick [64] argue that a value of Cronbach alpha >0.90 may suggest redundancies and show that the test length should be shortened.
Perhaps the main problem of the 'iCAN!' is its length.However it was somewhat compulsory, since the questionnaire maps one-to-one very closely to the Tuning 'level two' learning outcomes.Maybe a further refined version of these outcomes, a matter of the MEDINE network, could arise to a shortened version of the 'iCAN!' as well.On the other hand, one should keep in mind that the outcome of a medical school with six years of study might be difficult to be described by, say, thirty questions.Thus, further work is needed in these areas, we believe by the MEDINE network.
A warning rather than a limitation is that, although it was a national administration, the tool was not validated yet and our sample was not representative.Various sampling schemes, response rates, and cohort sizes across medical schools and time periods make the data appropriate for testing the instrument under construction than for detecting educational outcomes or comparing schools.Yet within Greece, not to allude to European or world schools, the results should be interpreted with great caution, since this is not a measuring outcomes study but a tool development study.However, the reported scores could serve as a good starting point for future measurements [65].

Conclusion
After a rather successful 'iCAN!v0' debugging, the hereby released 'iCAN!v1' has been created, an acceptably consistent, structured, sensitive (and probably responsive) tool for evaluating and monitoring of whether medical graduates meet the standard set of outcomes produced by the MEDINE network.It discriminates strengths and weaknesses, thus offers a basis for a SWOT educational policy.Any future comprehensive application of it will help obtaining a good picture of medical graduates' competences to inform policies and interventions.Since nobody can prove validity in any absolute sense [46,63] we expect that any future application will confirm its validity, reliability, sensitivity and responsiveness.

7 .
I can achieve to be the good tomorrow's doctor SD D RD RA A SA 8.I can manage effectively chronic illness SD D RD RA A SA 9. I can properly perform cannulation of veins SD D RD RA A SA 10.I can properly suture SD D RD RA A SA 11.I can move and handle patients without being at risk, neither the patients nor me SD D RD RA A SA 12.I can provide effectively explanation and advice to the patient SD D RD RA A SA 13.I can effectively apply to medical practice the best evidence available (evidence-based medicine) SD D RD RA A SA 14.I can apply effectively measures to prevent the spread of infection SD D RD RA A SA 15.I can assess correctly the psychological factors in presentations and impact of illness SD D RD RA A SA 16.I can properly venepuncture SD D RD RA A SA 17.I can recognize effectively my own limits and ask for help SD D RD RA A SA 18.I can use effectively the additional diagnostic and therapeutic options available within other healthcare professions SD D RD RA A SA 19.I can uphold undeviatingly the patient care principle STEEEP (safe, timely, effective, equitable, efficient, patient-centred) SD D RD RA A SA 20.I can provide effectively reassurance and support to the patient SD D RD RA A SA 21.I can use effectively information systems used in healthcare SD D RD RA A SA 22.I can review effectively the appropriateness of drug and other therapies and evaluate potential benefits and risks SD D RD RA A SA 23.I can communicate effectively in non-verbal (body) language SD D RD RA A SA 24.I can be creative while practicing SD D RD RA A SA 25.I can assess correctly the social factors in presentations and impact of illness SD D RD RA A SA 26.I can conform appropriately to professional regulation and certification to practise SD D RD RA A SA 27.I can recognise properly my own health needs SD D RD RA A SA 28.I can commit undeviatingly to maintain good practice and to concern for quality of care and of life SD D RD RA A SA 29.I can communicate effectively with those who require an interpreter SD D RD RA A SA 30.I can treat effectively pain and distress, with or without drugs as appropriate SD D RD RA A SA 31.I can provide effectively basic First Aid according to current scientific (evidence-based) guidelines SD D RD RA A SA 32.I can correctly perform basic respiratory function tests SD D RD RA A SA 33.I can recognise and assess correctly the severity of clinical presentations SD D RD RA A SA 34.I can work effectively in an international context SD D RD RA A SA 35.I can take a correct electrocardiograph SD D RD RA A SA 36.I can take the necessary initiatives while practicing medicine SD D RD RA A SA 37. I can obtain and record properly informed consent SD D RD RA A SA 38.I can receive and provide constructive professional appraisal SD D RD RA A SA Development and Validation of the 'iCAN!' -A Self-administered Questionnaire Measuring Outcomes/ Competences and Professionalism of Medical Graduates

Table 2 .
Cronbach's alpha by Tuning 'level one' learning outcomes, type of outcome, medical school, gender, completeness time of the questionnaire and overall tool

Table 3 .
The original Tuning 'level one' learning outcomes (TL1LO) and synopsis of the exploratory factor analysis (EFA) of the 'iCAN!v0' data (n= 508 graduates)

Table 4 .
Discriminant validity: Total questionnaire mean scores by various medical maturity groups.
P: participants; N: number of options chosen by the participants among the six offered ones (from strongly disagree to strongly agree); TMS: total questionnaire mean scores; CI: confidence interval.t: unpaired two-tailed t-test with unequal variances (if equal, a rather reasonable condition, p-values, not shown, are negligibly smaller).ANOVA was previously performed (p=0.000)p: p-value comparing the corresponding mean with the one immediately above it; e.g., the first p compares the second TMS with the first one, the second p compares the third TMS with the second one etc.; in bold the only non significant.1, 2: different fifth year students from two successional cohorts, 20101 and 20092.

Table 5 .
The 'iCAN! v1' questionnaire ready to use by a medical school to obtain its graduates' perceptions on their own medical abilities and professionalism Dear graduate, congratulations!Have a great career!Your medical school awarded you your bachelor degree.Now it is your turn: you can help your school, coming generations of colleagues, and tomorrow's patients and healthy.Do you agree with the following statements?To what extent do they represent you?This is about how you personally feel at this moment.
Development and Validation of the 'iCAN!' -A Self-administered Questionnaire Measuring Outcomes/ Competences and Professionalism of Medical Graduates If you do not agree (A) or strongly agree (SA) in the last question, where do you attribute your knowledge and abilities?_____________ What medical competence, which you are not competent at, you would like to be competent?Be as specific as possible: ______________ And what medical competence, which you are competent at, you would not bother if you were not?Be as specific as possible: _________ Make sure you have answered all questions and you have given only one answer for each one.Thank you for your collaboration!