Supervision and Control of Students during Online Assessments Applying Computer Vision Techniques: A Systematic Literature Review

Control of online evaluations online modality using artificial vision is a qualitative study that is based on the method of analysis and bibliographic conceptualization of 59 scientific articles taken from a total of 123 existing in the various bibliographic databases. The study began by addressing the issues concerned from the importance and understanding of online assessment control in the academic context to the types of computer vision algorithms and the main applications. To achieve this, the guided bibliographic technique was used through four research questions: What problems exist in online evaluations? What techniques have been used to detect plagiarism in online evaluations? What machine vision algorithms are used? What are the main detection and monitoring tasks that computer vision algorithms are capable of performing? The research questions allowed us to investigate problems of academic dishonesty, ease of committing plagiarism, online assessment control techniques, plagiarism detection techniques, object tracking algorithms, region-based algorithms, grid-based algorithms, face detection, detection of gestures and object detection. To determine the most relevant articles, three phases were considered. Phase one took into account inclusion criteria such as scientific articles, reviews, conferences evaluated by peers, studies carried out on the artificial vision algorithm, as well as online evaluations. The second phase gave the word search chain greater relevance to the bibliographic review and to provide it with adequate capacity to answer the four research questions, was ordered by year of publication, the topic, abstract and keywords were reviewed. Phase three reviewed by sections corresponding to the introduction and conclusion to know if the information contributes and if it is related to the research questions. The results of the information extracted from the scientific articles show that there is a need for supervision of students during online assessments, which can occur through computer vision algorithms, since there have been significant advances in these areas.


Introduction
It has been three years since everyone in the world has become internet savvy. Such technological innovation took over university classrooms facilitating student participation and minimizing the use of resources changing traditional learning models and evaluating practices [1]. Although this innovative tendency is enthusiastically welcome, there is resistance to change since online tests by some. [2].
Although technological advances applied to online evaluations are favorable to educational environments, there have been negative practices such as academic plagiarism since there is not an adequate control system that prevents such practices yet. For this reason, the use of the latest artificial vision mechanisms seem optimum since they are capable of analyzing facial expressions and identifying over 9000 types objects used to obtain information in real time helping the test taker [3]. Additionally, this tool offers a clear balance between speed and precision offering satisfactory results within an extensive range.
Indeed, the aim of this article is to classify artificial vision algorithms features, advantages and drawbacks to the control and supervision of online assessments. In particular, need to implement a prototype against plagiarism in the near future. Summarizing, the present study was performed in order to contribute to the problem of not having yet a system to perform an adequate exam supervision of online students, since this systematic literature review will be used as a theorical base to develop an artificial vision system and it will let us determine which academic problems should to be considered and which algorithms must be used.
Here are the questions that were used in the survey:  Issues online tests face today  Plagiarism detection techniques applied to online tests  Are there any artificial vision algorithms used?  Main detection and supervision tasks computer vision algorithms can execute at present

Methodology
SLR is a methodology suggested by [4], [5], [6] which answers a set of investigative questions found in scientific publications. Figure 1-shows the bibliographical review comprised in four steps 1) Investigation questions 2) Document search 3) Article selection 4) Relevant data gathering Explanation of each phase

Document Search
Word-chains such as ((-copying online exam‖ / -copying online courses‖ / -online evaluation plagiarism‖ (-computer vision face detection‖ / -computer vision gesture detection‖ / -computer vision object detection‖)) had been used. In addition, at least 10 documents per searched variable and word search variants.

Article Selection
The first phase included the following criteria: scientific articles, reviews, peer assessed conferences, computer vision algorithm studies as well as online evaluations. Computer Science and Engineering material found in 5-year-old publications-as a maximum period-2015-2020 along with Information and Communication Technologies sources written in English. Exclusion criteria included duplicate work, technical reports, book chapters, dissertations lower than Q2 SJR ranking and studies published in irrelevant areas.
In the second phase the word chain-search validated the bibliographical revision so that topics, summaries and key words are easily found in a by year-publication order The last phase reviewed introductions and conclusions in order of relevance. Total number of documents is shown in Table 3. Table 4 shows the 59 scientific articles selected for the research.

Relevant Data Gathering
Above, pertaining information from the 59 articles listed. Next, the categorized data gathered was set aside for analysis, discussion and feedback interpretation.
Not only had we evaluated the way online tests should be controlled, but also current online cheating detection techniques. Also, an accuracy level reassessment analyzed vision algorithms.

Results
Feedback from each investigation question proposed in Table 1.

IQ 1 Issues Online Test Face Today
The 15 articles selected prior to analysis- Table  5-address current online tests issues i.e. academic dishonesty and accessibility to plagiarizing explained below.

Academic dishonesty
Plagiarism results from students' unclear definition of what this word represents [7], [8]. While a large number of students get a place at university every year, training regarding plagiarism should be provided so it is clear that this is a serious academic offense that surely will hinder their university studies if caught, which involves suffering from an ethical impact because of the falsifying of data and misconduct. [9] [10]. [11], [12]. Modifying University policies applying sanctions to students performing academic dishonesty would be ideal [13] since online learning plays a vital role in the pursuit of a degree more than ever before [14] Online tests are more common and get accurate scores almost instantly.
An example of academic dishonesty: there are individuals known as -Ghost Writers‖ who get paid with the purpose taking online classes during the semester, even taking midterm and final tests [15]. Additionally, plagiarism could harm a certificate's impartiality [16]. A further example is the fact that accessibility to tests formats and information related to them may be carried out by University administrative personnel in exchange of money [17], it is evident that that there is a link between the use of technological devices as means for these practices. However, for the most part Wikipedia and other low quality digital repositories do not provide accurate data. [18] For this reason, the need to verify online content before using it on academic tasks is important.

Plagiarism accessibility
Facts: Plagiarism is a serious problem for students. [19]. Internet growth linked to the increase of information in several languages makes plagiarism even harder to control. [20] There are new technological tools developed for plagiarism detection useful to educational institutions [21]. So why have these tools not been incorporated yet? In The United Kingdom there is wide plagiarism media coverage [22] stating lack of academic integrity in the University community [23]. Likewise, levels of tolerance vary in local cultures. The concept of academic integrity and cheating prevention are challenges faced by worldwide Universities. [24]. Most students have ethical problems when taking academic tests even from the start of term. [25].

IQ 2 Plagiarism Detection Techniques Applied to Online Tests
The following studies have provided positive results to the XXI century problematic digital issue. Twelve articles were selected prior to analysis- Table 6 addressed plagiarism detection techniques 3.2.1. Specific techniques for the control of online tests [26] presented the E-Proctor system that compares data from all test participants and reveals them to the person in charge of supervising the activity in real time. However, there is a drawback in the comparison process; it can only be performed on computers connected to one network.
[27] Used a kind of software in their research that prevents academic dishonesty during online testing. They utilized a JavaScript algorithm that detects the specific moment when students leave a page to work on another page.
[28] several online cheating prevention strategies are included in this study to improve tests taking strategies. This article suggests preparing different tests each time, open-book tests using the web and responding to open-ended questions. According to the author, these simple strategies had positive feedback.
[29] used a latent variable on sampling, evaluating the hypothesis stating that there is more cheating involved in online tests than in-class supervised tests.
[30] presented a new online test tool called DSLab to analyzing several parameters that determined its design and features to tackle plagiarism. Moreover, DSLab demonstrated a satisfying level of usability and efficiency.
DWright is a system that analyzes plagiarism before, during and after the test by comparing both qualitative and quantitative data entered by students during the test. [31].
Turnitin is a plagiarism detector used in online tests. However, this type of software has a real time limitation feature, it compares data entered only after the completion of an entire paragraph or data entered [32].

Plagiarism detection generalized techniques
In [33] they used key words and characters connected among themselves to detect plagiarism. Therefore, the binary code plays an important role executing software engineering tasks in addition to malware detection so the binary code is not only widely used in virus detection, but also used in plagiarism detection.
[34] developed a quantitative and qualitative database to verify discrepancies between plagiarism persecution and detection among students, so CLIR-cross language information retrieval and Turnitin a plagiarism detection software were applied in this process. [35] analyzed statistical properties from the most commonly used words, to determine the words usage patterns and to detect when plagiarism occurs Amazingly, this technique revealed positive feedback, reaching 97% accuracy in plagiarism detection.
[36] featured the creation of a database called Academic Thai Plagiarism Corpus-ATPC used through any data comparison software in Thai language, seeking to build a database in several languages able to make comparisons and detect plagiarism regardless of language.  Table 7 shows 12 articles selected prior to analysis addressing algorithms known applicable to online test control.

Object tracking algorithms
Object-tracking algorithms form two categories. Generative trackers identify areas most similar to the objective through Maximum Likelihood, while trained discriminating tracking algorithms systems classify eligible regions corresponding to the object of interest as well as the rest of the image Boosting Tracker. This technique builds weights to each deficient classifier matching last-frame response. Next, a pondered adaptive response map identifies the object's position. [37].
MIL Tracker--Multiple Instance Learning. This algorithm is similar to Boosting Tracker since both belong to the discriminative type. MIL has the capability for self-adjustment extracting negative and positive examples from actual frames, despite it could face some issues when using mislabelled images causing classifier degradation. For this reason, MIL avoids these issues because it is an adaptive algorithm, and evolves despite changes in 1006 Supervision and Control of Students during Online Assessments Applying Computer Vision Techniques: A Systematic Literature Review appearance of the object [38]. KCF Tracker--Kernelized Correlation Filters-this algorithm tracks objects with kernel ridge regression where images are transformed into circulant matrixes digitized in the Fourier domain, so the domain identifies objects through a form Gaussian label along with a dual Lagrange multiplier [37].
TLD Tracker-(Tracking, Learning and Detection) usually present challenges like manual area tracking, plane rotation incorrect adaptation or marked distortions. Nevertheless, TLD algorithm breaks into learning, detection, and tracking sub-areas enabling long -term detection performed in a frame-by-frame process, so the scanner locates objects of interest and if necessary, edits the tracker. The learning algorithm estimates error detection updating it with positive negative learning (PN learning) for future error prevention [39].
Medianflow Tracker This algorithm consists of an object's location approximation in a sequence of consecutive frames performed by an optical-flow dispersion analysis. With the help of a Lucas-Kanade optical-flow pyramid algorithm, so a dispersion-grid built around the object of interest tracks the edges in each frame [40].
Goturn Tracker-Generic Object Tracking Using Regression Networks-This algorithm tracks objects in real time by moving offline objects visualization along with an offline neuronal network in diverse scenarios. During the testing phase neuronal network weight freezes until acquiring optimum tuning levels, and during offline training, the tracker masters the tracking of new objects more robustly [41].

Region based object detection algorithms
They are built to create -bounding boxes‖ for a large number of proposed regions, by a selective execution search and in turn, searches within an image through different-size windows. The algorithms group pixels based on texture, color and density for each sized window.

RCNN (Region Convolutional Neural Network) this algorithm generates a set of proposed bounding boxes and deposits images from the boxes in an AlexNet convolutional neuron which uses a Supported Vector
Machine (SVM) which allows to detect an object inside them. Finally, the process executes a lineal regression model to calculate coordinates where the object is located, more accurately. [42] [43].
Fast RCNN has a series of advances for RCNN, as the use of a grid layer for each region of interest (ROI) gaining enhanced accuracy. This particular model uses a Deep Convolutional Network that classifies all the object proposals, resulting in a slower model because each image is processed without sharing processing resources. However, this operation could be improved through the use of Spatial Pyramid Pooling Networks (SPPnets) which classifies each object through a vector extracted from a map of characteristics, thus, multiple outputs are grouped and connected in a pooling process; minimizing R-CNN training time from 10 to 100 times [44] [45].
Mask RCNN presents a new version based on FAST RCNN, including a parallel branch in charge of predicting masks for each one of the tracked objects. [46]. This algorithm isolates into two stages. The first one makes a proposal using ResNat50+FPN extracting characteristics and maps obtaining then, a large number of candidates through a Region Proposal Network-RPN along with a foreground binary classifier. The second stage classifies the proposal and generates the bounding box and mask through frame regression in a fully convolution network (FCN) [47].

Grid-based object detection algorithms
This type of algorithm separates the image through a grid so the whole image is transferred to a hidden convolutional network layer, thus, smaller sized objects cannot transfer the object's characteristics to the last layer. In this way, these algorithms are able to run at high processing speeds-45 to 100 FPS.
Yolo-You Only Look Once-is a real time object detection system, pretrained to execute specific detection tasks [48]. This algorithm uses a 53 FCN layers normalized with Leaky ReLu activation. YOLO distinguishes itself by detecting an image only once. For this purpose, it subdivides an image into a certain number of equally sized cells responsible for predicting 5 bounding boxes and confidence scores, that indicates whether a box shape is right to fit the object or not. YOLO's new version YOLOv2 has a speed of 40 fps [48] while YOLOv3 is executed from 22 ms at 28.2 mAP, which makes it as accurate as SSD but three times faster. [ [61].
SSD (Single Shot Multibox) this algorithm is nearly as fast as YOLO, providing network input image in a single step adding several characteristic layers at the end, which predict probability values for each object type. With this algorithm, the neuronal network generates scores to detect the presence of each object in each bounding box adjusting itself to fit in each object. By doing so, the proposal generating stage and the pixel sampling stage are unnecessary, so this technique is faster than proposal-based methods. This algorithm has an accurate mean average 72,1% [62] [63] [60].

IQ 4 Current Main Detection and Supervision Tasks Computer Vision Algorithms Can Execute
Seventeen selected articles prior to analysis in Table 8 address the need to identify the main detection and supervision tasks that current computer vision algorithms are able to execute.

Facial detection
[50] presents a YOLO assistance model for object detection and a Multi Task Convolutional Neural Network (MTCNN) For facial detection in a Raspberry Pi tailored for visually impaired people. The implementation of the Yolov2 algorithm reached 6-7 FPS 63-80% accuracy levels, while the facial detection reached accuracy level of 80-100%.
[63] presents the FaceDetecNet system based on FCN similar to SSD. FaceDetecNet provides a computational speed over 30ms/frame and Average Precision (AP) of 0.8, higher than SSD's algorithm that under parallel conditions reaches speeds up to 1000 ms/frame. Both algorithms operated through NVIDIA GeForce 1080 GPU.
[51] presents an emotion recognition system with ios scenar-mundo-real supported by two facial detection resources using YOLO along with a CNN assemble intended for human-robot interaction (HRI) carried out through FER-Facial Expression Recognition-database. Though real time performance tests the application reached an accuracy of 72.47%, where facial-emotion detection frequency was 3 Hz.
[52] presented a method to optimize real-time video processing in CNN's facial detection and features by reducing existing weights and weight parameter overlap. They found that optimizing speed and accuracy is possible by eliminating some neural network hidden layers. The model was tested to detect 68 facial features using a commonly used processor for embedded and mobile device, using YOLO for detecting large faces and BSMNet and MTCNN for small faces. This optimization process executed both algorithms in embedded devices resulting in similar accuracy and speed than the values obtained in a PC.
[53] presents a real-time YOLO object detection system applied to facial detection. Experimental results conclude that YOLO facial detection has excellent strength and high detection speed even in complex environments. [54] presents real-time human gesture identification to control a UAV Unmanned Aerial Vehicle-without using a GPS. YOLOv2 was implemented for the location of an individual's face and two the two hands of a person, so the gestures are given by the hands movement which are interpreted as flight commands por the UAV, acquiring real-time high accuracy levels.

Gesture detection
[43] presents a multiscale deep learning model detecting hands over images which can be used to perform gesture detection. This model was based in R-CNN which are used to obtain the proposal regions followed by a fusion with VGG16 multiscale model comprised by five convolutional blocks were the first two blocks had two convolutional layers and the previous three blocks presented three convolutional layers. This model differentiates from conventional models because the region of interest characteristics grouped exclusively in the last three convolutional blocks obtaining better results than VGG16 simply because of its superior performance in small hand detection from large images.
[55] presents hand-gesture detection using Deep learning as an interaction mean for VR-virtual reality-acquiring real world images transmitted by a camera mounted on headsets. User's hands gestures fuse with virtual images, offering a true immersing and interactive experience and, superior accuracy with the use of YOLOv2. However, due to its computational cost and drastic FPS falls, SSD was used instead, so users interacted through virtual reality gestures without having to take the headset off.
[56] details an intuitive interface implementation on an augmented reality device designed to assist visually impaired people. This system identifies each surrounding object and a voice prompts the user to evade obstacles, scene comprehension, spatial memories building amongst others. This system is called CARA-Cognitive Augmented Reality Assistant-which includes functions like: volume increasing warning which allows to recognize; sound alert when an object is too close; scan mode, where all detected object names are called from left to right; spotlight mode, where the name of the object directly in front is called; target mode, where an object of interest is selected and the system calls it until the person finds the object. [45] presents SqueezeNet a fully convolutional neuronal network for real time object detection trained for cars, cyclists and pedestrians detection. This YOLO inspired network uses convolutional layers to extract characteristic maps, infer each class probability and obtain bounding boxes through simultaneous output layers reaching detection speed up to 57,2 FPS with a precision similar to the state of the art but with a model size 30.4 times smaller, 19,7 times higher interference speed and 35,2 times less energy. This model-demonstrated superiority in terms of speed and accuracy when detecting cyclists and pedestrians when compared to FRCN+VGG16, FRCNN+AlexNet, VGG16-Det y ResNet50-Det, [57] presents a dense tracking -3D reconstruction object detection system using a depth camera and a Kinect sensor. YOLO detects and follows a large variety of objects belonging to different classes while camera pose estimate is realized by a model-to-frame technique using-ICP-coarse-to-fine iterative closest point algorithm. The estimated depth maps are estimated in a volumetric structure using estimated camera poses and the Marching Cubes algorithm applied for the visualization of the reconstructed scenario.

Object detection
[48] presents a multiscale YOLOv2 training method for the detection and classification of over 9000 objects, identifying different-size objects by the use of convolutional and pooling layers which allow the model to be resized during its execution. The neural network changes image dimensions randomly every 10 iterations, thus using this method: YOLOv2 reached 76, 8 mAP accuracy at a speed of 67 FPS on the VOC 2007 challenge and at 40FPS speed it reached a mAP of 78,6.
[58] presents a 3D object detection method known as Complex-YOLO, applied to autonomous vehicle operation using RGB images and a point cloud acquired by a Lidar, which is intended to estimate 3D boxes for the objects in cartesian coordinates through a complex regression strategy. Additionally, in this scientific article is presented the Euler-Region-Proposal Network-E-RPN-which estimates an object pose. Testing was done using KITTI benchmark suite which is comprised by datasets that contain a variety of cars, cyclists and pedestrians yielded results similar to the state of the art, apart from getting a processing speed up to five times faster. [59] presented an innovative CAD assisted diagnostic system applied to breast masses detection based on convolutional neuronal networks using YOLO. The system is able to simultaneously detect and classify masses in 600 mammograms taken from DDSM and 2400 augmented mammograms recorded several types matter in a database. This trained model allows detecting whether the masses correspond to benign and malignant cells with 97% accuracy.
One-stage detectors like YOLO and SSD have not reached high accuracy detection as two-stageproposal-classification-detectors-however, one-stage object detectors have the potential to reach higher accuracy levels; the problem seems to lie on the eminent imbalance between the image foreground and background during dense detectors training phase. [60] presented an alternative by re-designing the standard cross entropy function to decrease the weights in the cost function assigned to well classified samples, and this was called focal loss. Furthermore, the testing of this methodology included a redesigned detector known as RetinaNet that, which was applied on COCO benchmark reaching the same speed of existing one-stage detectors but exceeding their accuracy.
[61] proposed a flying insect count and classification system that includes bees, flies, mosquitos, moths, chafers and fruit flies using YOLO for object detection and SVM for the count. Implemented through a camera and a Raspberry Pi, the system reached 92,5% accuracy count and 90,18% precision detection making this system an exceptional choice in precision-agriculture applications.

Discussion
In this section results shown in tables 5 to 8 are extended and represented as time series in figures 2 to 5, where axis was used to plot the frequency of scientific studies performed in such year for each subcategory found for each research question.
Updated recent algorithms have come as far as carrying out supervision techniques to prevent academic plagiarism in real time. It is evident that bibliographical reviews revealed most students accept that academic dishonesty is caused by easy accessibility to internet data hence, getting higher marks. Figure2 shows data gathered from table 5 beginning with the number of online evaluations issues throughout the years.
Although plagiarism was at its highest in 2018, there was a significant decrease in the following 2 years. However, academic dishonesty continues to be a topic of interest relevant in the educational field reaching its peak in 2019.
Further, the study has concluded that these trials do solve real issues adapting to online education. Such online control techniques explained in more detail in Figure 3. Figure 3 shows an increased number of investigations suggesting increasing interest to develop even more techniques. Similarly, evaluation methodologies captured researchers' interests in 2018. Although this tendency decreased in 2019, it is likely to continue in the near future.   Again, computer vision algorithms cutting-age development has captured researchers' attention in recent years. AI-artificial intelligence provided huge computer vision technological advances, particularly with the introduction of Boosting and CNN algorithms, providing endless number of algorithms and proposed techniques so in this study we classified algorithms by object tracking region based and grid based algorithms. Refer to figure 4 detailing data gathered from Table 7.
The number of studies in the three computer vision algorithm categories addressed in this study has increased in recent years due to a tremendous interest from the scientific community in the development of computer vision grid-based algorithms like YOLO and SSD, popular among scientific articles since 2016; because of their high detection speed values despite having recently achieved similar precision values to region-based techniques, so nowadays these algorithms are the most common in the state of the art.
Finally, in regards to the last question in the study, the main algorithm's detection applications are: face, gesture and object detection, which researchers have been interested in recently and significant advances have enabled its implementation. Refer to Figure 5.
Face, gesture and object detection algorithms have extensively been developed in recent years indicating a growing tendency in the number of published articles. However, object detection articles take precedence.
For instance, CNN's considerable classification tasks progress has appealed researchers' interest in developing extensive databases, conferences, and open challenges that have encouraged the developing of a large variety of systems which are able to detect thousands of objects.

Conclusion and Future Projects
By means of this literature review work, it can be concluded that, this kind of technological innovations must be applied and integrated to the e-learning environments, since artificial intelligence and computer vision are being widely studied and has been showing big advances that can contribute to improved security and reliability in online education. In this sense, every effort made to contribute with this field of study is relevant and must be supported by higher education institutions and researchers.
On the whole, plagiarism and online tests control must be fundamental at a worldwide level. According to the bibliography analysis, they would have a beneficial impact on e-learning platforms enhancing credibility in this study modality, which will allow to achieve high acceptance levels as similar to classroom-based modality in the near future.
Nowadays, several techniques are being applied to support online supervision for the students during evaluations by the use of simultaneous comparison, tests with different questions for each user, time synchronization techniques and blocking windows detection in case of accessing additional digital content. However, most authors suggest that in person test control is certainly preferred in traditional classroom evaluations, simply because of the presence of a human observer, but this is not suitable for the majority of online learning activities. Therefore, having addressed a large variety of computer vision algorithms and their applications, it can be concluded that the implementation of these techniques for online assessment control and supervision is possible and it will bring significant improvements in the credibility and acceptance of this learning modality. Additionally, we concluded the most suitable algorithms that can be used for this tasks are the grid-based detectors like YOLO and SSD because of their high detection speed, accuracy and real time capabilities, so, in the near future we expect to see this kind of algorithms integrated in the virtual learning environments performing tasks like: face detection and recognition, object detection, gesture detection, and pose estimation; for a large variety of e-learning suitable applications like: online assessment supervision, authentication, interest level estimation among others.
In this sense, this study establishes a bibliographical base to the development of an online control and supervision system as evidenced in previous pages. Further, the need to implement this system supported by CNN and grid-based algorithms like YOLO and SSD may well include state-of-the art user's facial detection and identification functions, besides plagiarism signs and object detection processes to identify notebooks, cellphones and the like.