Quality Assurance System on Scoring Guidelines: An Example from Mathematics Department in Indonesian Distant Learning

The purpose of this study is to analyze the Scoring Guidelines in the assessment of students' essay test answer book of the Mathematics Department in Indonesian distant learning college. This study has been conducted in Jakarta, Yogyakarta, Surabaya, Medan, and Palembang. Students’ test answer books consisted of three subjects, namely 1) the comprehensive test of MATA4500/TAP, 2) the main course of MATA4111/Calculus II, and 3) MATA4213/Numerical Methods. The sample consists of 293 students who registered during the first semester of 2014, the second semester of 2014, and the first semester of 2015. The students test answer books for each course is evaluated by two lecturers from a local college, the qualification of the lecturers are Master on Mathematic. The lecturer assessed students' essay test answer books by using a scoring guide/marking schemes that created by the test developer from the Mathemat ics Departement. Data for this study were analyzed using SPSS, t -test. The results show that the differences scores of 280 out of 293 (95,55%) given by the two lecturers show that it does not exceed 5%, it is fo llowing the Sistem Jaminan Kualitas (Quality Assurance System) Universitas Terbuka 2013, No. Document JKOP_UJ03-PK04 PK Examination of Test Results Description. Which in turn prov ides that the Scoring Guidelines are good and clear.


Introduction
Universitas Terbuka (UT), is a State University that implements a distance learning system, which means students learn using media, both print, and non-print media.
Therefore UT students are expected to learn independently, by utilizing teaching materials that have been prepared for self-study by UT.
Quality Assurance, in general, refers to a process of defining and fulfilling a set of quality standards consistently and continuously to satisfy all consumers, producers, and other stakeholders. UT formu lated the university's Quality Assurance System or Sistem Ja minan Kualitas (Simintas) in October 2001 (Tian & Amin, 2010) Evaluation o f learn ing outcomes imp lemented in the form of final exams, co mprehensive tests with scientific works, and On line Tutorial. The co mprehensive test is intended to verify the mastery of students in a comprehensive field of science in the program Strata one (S1). The Co mp rehensive test material covers some of the support courses available in the Mathematics Depart ment. Since the first semester of 2015, the Co mprehensive test score combined with scientific works that can be uploaded on UT web application via http://karil.ut.ac.id (Katalog Universitas Terbuka 2017/2018 A final exam may be an object ive test (mult iple choice) or a description test. The final exam answer is written for the objective test done on the Test Answer Sheet, where the student fills out the Test Answer Sheet using a 2B pencil. The final exam answers to the test description including comprehensive tests are done on the students' test answer book.
The final exams and comp rehensive test equipped with Scoring Guidelines developed by the lecturer in the Mathematics Department as an assessment guide for examiners fro m local universities at regional. The Scor ing Gu idelines along with final exams and comprehensive tests are sent by the UT-Testing Center to five regional ones day after the implementation of the final exam is comp leted. The results of the research are used to find out information about final exa ms and comprehensive tests Scoring Gu idelines according to the examiners at five regional. Perfection improvements can be made which in the end the research objective to obtain a good Scoring Guideline can be done.
The Essay Test, in general, the number of test items less than the objective test, the student in answering the essay test requires a long sentence and paragraph to answer the test questions. There are two types of essay test, first explanatory test that is open answer test that is students are not given limitation in writing and o rganizing their answers and second limited answer test ie students are given specific limits and context in terms o f form and reach of answers ("Suwarto Mengungkap Karakteristik Tes Uraian", 2010).
Students in the essay test can demonstrate their ability to interpret facts and concepts and compose answers in sequence and integrated, and the process can be measured (Writ ing Essay Test Items). Students' distance learning performance can be measured by evaluating learning outcomes using the essay test (Nakayama, Yamamoto, & Santiago, 2010).
Characteristics of the essay test are, (1) in general one item contains some questions; (2) answers are given in written form; (3) answers are given in long descriptions. The positive aspects of the essay test are, (1) the essay test can assess students' understanding at a high level; (2) students can present their ideas and thoughts; (3) the essay test can be prepared in a relatively short time. The lack of a description test is, (1) the value may be different when judged by the same corrector at different times and / or by different corrector at the same time; (2) the length and complexity of the answers can cause problems in the assessment; (3) the time needed to correct the problem. ("Suwarto Mengungkap Karakteristik Tes Uraian", 2010) Assessment in the essay test has two types Analysis Method and Global Method. The Analysis Method uses an ideal answer whose value is arranged in detail step by step and each stage is assigned the value of an answer. The value given by the lecturer is the basis of the number of stages of the answer. While the Global Method is the ideal answer is not shared in the specific stages, lecturers examine globally by assigning an overall grade of student answers ("Suwarto Mengungkap Karakteristik Tes Uraian", 2010). UT examines the results of the examination test at Regional office, the examiner is not allowed to bring home students' essay test answer book along with other examination files. Examination of test results is done by two independent examiners using the Scoring Guidelines. The maximu m limit of value d ifference between the t wo examiners, for non-exact courses is 10%.
As for the exact course is 5%. If the d ifference in value between the two examiners exceeds the maximu m limit, then the two examiners re-examine the students' test answer book. (Simintas, 2013) The Mathematics Depart ment is supported by students through the Final Semester Examination (UAS), there are two types of first evaluations, the mult iple -choice examinations, and the written examinations. Written examinations are developed by lecturers in the UT Mathematics Depart ment (Tut isiana & Zu lmahdi, 2015), while the assessment is carried out by lecturers or examiners fro m other universities who have a mathematical background with master and doctor of philosophy qualifications located at 5 Regional namely: Jakarta, Yogyakarta, Surabaya, Medan, and Palembang. Examination of student exam results is carried out by 2 examiners to maintain the object ivity of the assessment of student exam results maintained. To reduce subjectivity a Test Examination Guideline is needed for examiners. As with the mu ltiple -choice test, the student's answer is checked by the machine (Personal commun ication with Evaluation experts, DR. Herman 02/05/2019). Furthermore, Bonnie L. MacGregor (MacGregor, n.d.) states that clearly-made Examination Guidelines are an important tool for viewing performance or assessing student exam results. This opinion is also supported by Tutisiana and Arter (Tutisiana & Zulmahdi, 2015) (Arter, 2010) stating that specially designed Examination Gu idelines can provide clear information about student examination results.
UT implements learning using a distance learning system. As a consequence, assessment of students' essay answers book conducted at five Regional, so that scoring guidelines are the necessary guidance for examiners. The scoring guide is the lecturer's reference in scoring the student' s answers that contain the possible answers and scores appropriate for each possible answer (Simintas, 2013).
The Scoring Gu idelines come with a score for each answer process or stages of answered made by the student. Gu idelines were developed as a guideline Assessment scoring for lecturers who checked students' essay test answers book. Therefore, lecturers at five Regional examining students' essay test answers book with the same standard. Every one student essay test answers book examined by 2 correctors, the grade came fro m the average between the first corrector and the second corrector.

Methods
The study was conducted at UT's Examinat ion Centers at March-April 2016, intended to study how much d ifferent grades of students between the corrector/examiner 1 with 2. Regarding the students' essay test answers book which has been examined. The three courses examined namely : 1) MATA4500 / TAP, is the comprehensive test, 2) MATA4111 / Calculus II, is of co mpetence subject, and 3) MATA4213 / Nu merical Method, this course is registered by students representing 5 Regional. Students' essay test answers book examined for 3 exams periods: first semester of 2014, the second semester of 2014, and the first semester of 2015. The survey data for this study were collected fro m the students' essay test answer books for three courses at 3 exam periods, as shown in Table 1. Data for this study were analyzed using SPSS, t-test.

Results
The t-Test results for all courses between the 1 st examiner and the 2 nd examiner, shown that no difference. All this can be seen fro m sig (2-tailed) in an independent sample test in the t-test colu mn. A ll values are significant at p> 0.05, which means there is no d ifference between the assessments given by the two examiners. Thus two examiners fro m the local colleges give the same value for the students for the three Courses namely 1) the comprehensive test of MATA4500/TAP, 2) the main course of MATA4111/Calculus II, and 3) a course taken by the students in five reg ions of MATA4213/ Nu merical Methods.

Assessment by Examiners at Regional Centers for MATA4111/Calculus II
The t-Test results in forMATA411/ Calcu lus II courses. All this can be seen fro m sig (2-tailed) in an independent sample test in the t-test colu mn. A ll values are significant at p> 0.05, which means there is no d ifference between the assessments given by the two examiners, shown that no difference between the 1 st examiner and the 2 nd examiner. Thus the two examiners fro m local colleges give the same judgment for MATA4111/ Calculus II Students' Essay Test Answer Books.

Assessment by Examiners at Regional for MATA4213/Numerical Methods
The t-Test results for MATA4213/ Nu merical Methods courses shown that no difference between the 1 st examiner and the 2 nd examiner. All this can be seen fro m sig (2-tailed) in an independent sample test in the t-test column. A ll values are significant at p> 0.05. Stated that the two local colleges give the same value for the Students' Essay Test Answer Books for MATA4213/Numerical Methods.

Assessment by Examiners at Regional for MATA4500/Comprehensi ve Test
The t-Test results for MATA4500/ Co mprehensive Test courses shown no difference between the two examiners. All this can be seen fro m sig (2-tailed) in an independent sample test in the t-test colu mn. A ll values are significant at p> 0.05, which means there is no d ifference between the assessments given by the1 st examiner and the 2 nd examiner. As a result, no difference between the two examiners in gives judgments for the Students' Essay Test Answer Books for MATA4500/Comprehensive Test.

Discussion
The t-Test results for either the course or for each course indicate no difference between the 1 st examiner and the 2 nd examiner. A ll this can be seen from sig (2-tailed) in an independent sample test in the t-test column. All values are significant at p> 0.05, which means there is no difference between the assessments given by the examiner 1 and 2.
Based on the results of the study, the Scoring Gu idelines, which were developed by the lecturer at the Mathematics Depart ment, which was used as a guide by two university lecturers in the Reg ional to assess the results of the student test, successfully managed to guide the examiners to examine the test results with the same perception. Scoring Gu idelines can guide two lecturers well, thus there is no difference in assessment between the examiner one and the second examiner, on the student' s written exam results. This means that the Scoring Gu idelines are valid, that is, they can carry out their measurement functions appropriately and reliably, that is, shows consistency as a measuring tool and ensures the objectivity of the assessment.
Besides that, according to Ron Leg ion (Ron, 2015). Scoring Gu idelines have the advantage of being able to improve co mmunication between academic institutions in improving the quality of distance education. Furthermore, Scoring Gu idelines is an evaluation instrument that can be used to evaluate an assessment, because the Scoring Gu idelines can assure the accuracy of the assessment and can min imize d ifferences in perception and subjectivity of the lectures (I Made S.U.).
The advantages of the Scoring Guidelines are: 1. Reduce the time spent grading by following the instruction from Scoring Guidelines. 2. Facilitation examiners ensure consistency across time and grading. 3. Decrease differences in perception and subjectivity element in conducting an assessment.

Conclusions
The finding of this research is as follow. A benefit of using a Scoring Guidelines fo r the examiner is to have the criteria and the scoring standards defined to facilitat ing the activities of correct ing the students' essay test answer book. Besides, the two examiners stated that they have the same perception for the Scoring Gu idelines. With the result that, the examiner gives with the same perception, thus there is no difference between the assessments given by the 1 st examiner and the 2 nd examiner. In my op inion, according to this research, the Scoring Gu idelines are worthwhile. Scoring Guidelines can reduce the subjectivity of evaluators to test results, so students are not disservices. Scoring Gu idelines can be applied to other universities that use the distance education system.