A Review and Evaluation of Human Interactive Proof (HIP) Technique for Combating Malicious Automated Scripts

Advances in the field of Information Technology (IT) make Information Security an inseparable part of it. In order to deal with security, authentication plays an important role. Computer Scientists have developed Human Interactive Proof (HIP) commonly known as CAPTCHAs (Completely Automated Turing Tests to Tell Computers and Humans Apart) as a challenge-response test used in computing to determine and confirm the identity of an individual requesting their services form that of malicious automated scripts. It is a security measure which uses computer programs that automatically generate and grade puzzles that most people can solve without difficulty, but that current programs cannot. The purpose of such schemes is to ensure that the rendered services are accessed only by a legitimate user, and not anyone else. This paper presents a brief overview of the literature in the field of CAPTCHA authentication techniques in the online environment. Furthermore, it evaluates HIP with an objective to provide insights on their lack of acceptance as well as some suggestions for further research in this field.


Introduction
Security is now becoming a more important issue for business organization, web bloggers, site owners, and the need for authentication has therefore become more important than ever. A CAPTCHA is a challenge-response test most often placed within web forms to determine whether the user is human. The purpose of CAPTCHA [1,2,17] is to block form submissions by spam bots, which are automated scripts that post spam content everywhere they can. The most generic type of CAPTCHA consists of an image of seemingly random numbers and letters that are distorted to thwart optical character recognition. The use of HIP systems for web site authentication is a response to the rising issue of eliminated annoying spam bots from many sites. This will allow users to spend more time working on critical assignments instead of dedicating precious hours deleting spam comments. Therefore, the first line of defense against a considerable percentage of spam is simply by posing a challenge in the form of a word or math question that will protect the site in question. It sounds elementary, but HIP developers have saved thousands users and site administrators from miserable bots and trolls.

From CAPTCHA to ReCAPTCHA
About 60 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that's not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day. Consequently, it is necessary to channel the effort spent in solving HIPs into positive use reCAPTCHA does exactly that by channeling the effort spent solving CAPTCHAs online into "reading" books. ReCAPTCHA, which utilizes a third party database of scanned words from books, is a system developed at Carnegie Mellon University [13] which utilizes CAPTCHA to assist in the process of digitizing unreadable words or text from old books, while protecting websites from bots attempting to access restricted areas. This improves the process of digitizing books by sending words that cannot be read by computers or Optical Character Recognition (OCR) to the Web in the form of CAPTCHAs for humans to decipher. This is possible because most OCR programs alert you when a word cannot be read correctly and thus such word that cannot be read is placed on an image and used as a CAPTCHA. Craigslist began using reCAPTCHA in June 2008. In addition, the U.S. National Telecommunications and Information Administration also use reCAPTCHA for its digital TV converter box coupon program website as part of the US DTV transition. CAPTCHA and reCAPTCHA are an absolute must for any website. ReCAPTCHA also saves users from having to configure the image CAPTCHA which requires uploading a font to the server. In figure 1 above, the ReCAPTCHA challenge contain the words "following finding." Users need to know that ReCAPTCHA is user friendly and its purpose, so that they will participate fully in helping digitize text for the betterment of the world.

Summarization Review
For lack of space we have decided to summarize and present HIP review in a tabular form as shown in Table 2.1. In addition, we also choose few examples of various HIPs to show them pictorially (see Table 2.2).

Methodology and Data Analysis
The data needed for this study was collected by distributing test question materials to postgraduate and undergraduate students of Computer Science department since they are amongst the best users of various web sites and must have come across the subject matter. For the first administration of the experimental test questions, hundred experimental test materials were distributed, the researcher was able to collect ninety three out of the materials. Latter, seven more questionnaires were distributed, to get a perfect number of hundred. The researcher was able to collect all the questionnaires for analysis. Each respondents views on the questionnaires were analyzed by the researcher by summarizing the views of the respondents as interpreted in the questionnaires. The views were entered into Statistical Package for Social Science (SPSS), a software program for statistical analysis. Its graphical user interface has two views: Data View and Variable Views. We are concerned with the former. The "Data View" shows a spreadsheet view of rows (cases) and columns (variables). Unlike spreadsheet, the data cells can only contain numbers or text and formulas cannot be stored in these cells. The data were organized in the worksheet in such a way that unnecessary data (respondent's views) were discarded. All data were housed in appropriate column in which the data has relation or dependence for more accurate results. The tables are not illustrated here because of space, however, the outcome from the tables are discussed below.

Discussion
This section presents research findings on CAPTCHA code and users view concerning the CAPTCHA system. The data needed for this study was collected by distributing one hundred (100) test question materials to postgraduate (69) and undergraduate (31) students of Computer Science department from the University of Ibadan as mentioned above. What is supposed to be a test turns into an annoyance as 43% of the respondents were annoyed as they do not understand why they should answer the CAPTCHA test while 54% where annoyed because of the HIP codes (unclear words). 38% of the respondents were angered by the fact that the challenge was time consuming as 70% claimed that they were confronting even denial of service. It also seems that those words that failed OCR are intentionally made even harder to read by putting lines across them so a "P" might as well be an "R". This was observed in that most respondents instead of writing "24YGP2VY" they wrote "24YGR2VY". Other letters that are likely to cause confusion when lines are used are between an "o" and an "e", an "o" and a "q", a "v" and a "y", an "x" and a "y", a "K" and a "L", etc. On subsequent encountered of the HIP test, 31% of the respondents feel irritated, 41% feel threatened while other feel distracted.
It seems some people like challenge response systems such as CAPTCHA code and others do not. Also some users are acquainted with the functionality of these codes. Users' knowledge and likeness of the purpose of the code affects their success or failure rate in a given test. Among 42% of the users who like the CAPTCAH code 38% got the code right while 4% failed it. On the other hand, among 58% of the user who did not like the code 40% passed the challenge, while 18% failed. Therefore, there is a need for increase users' awareness of the purpose of the code so that substantial number of potential customers would not be driven away from a site. From our observations, it can be seen that among the 58% of users who do not like the code only 19% will pass the challenge at a first attempt, while 37% will pass the challenge at the second attempts and 1% will success at the third or more attempts. While among the 42% of those who like the code, 12% will success at first attempt and 30% success in the second attempts.  [17] Human Interactive Proof (HIP) A puzzle to verify that it is a human that is making a request to a service over a web Hopper (2001) [12] Secure Human Identification Protocol (HUMANOID) In which a computer must verify a human's membership in a group without requiring a password, biometric data, electronic key, or any other physical evidence.
Hopper & Blum (2001) [11] Reading CAPTCHA Distorted password for users to type Blum et al (2000) [5] ARTiFACIAL Displays an image with a distorted face the user is asked to first find the face and then click on 6 points on the face. Déjà vu Requiring users to recognize an object they have seen before Perrig & Song (2002) [19] Sound-based CAPTCHA Presented a distorted sound clip and ask user to enter its contents Nancy Chan (2002) [7] PhoneOIDs Authenticate users over phones Blum (2002) [6] 3D CAPTCHA Users are asked to type the alphanumeric character that overlies a particular feature Zilina, Juray Rolko (2010) [22] Map Requires the subjects to navigate between two random points in a 3D world or network Perris & Song (2002) [19] Assira Users asked to identify cats out of a set of 12 images Elson et al (2007) [10] Word-Associative CAPTCHA Users are given a set of words, to select the word that does not belong to the group. Chinmay kulkani (2008) [23] AQ-SOCHAMACAP Mixing alphanumeric and special characters, a user is asked to look at the characters provided and then answer the challenge based on it equivalent question.
Onwudebelu et al (2011) [16]  As mentioned above, the 13% of the respondents who feel threatened believe that they are being set up to fail while others were annoyed by the fact that they did not understand why the CAPTCHA was presented in the first place. Some students become alienated, discouraged and in some instances bored. 46% of the respondent declares that they have had to abandon a web site as result of the complexity of the HIP codes. Concerning the HIP they were asked to answer, they lamented over it as being unfriendly, 55% says that it was confusing, 36% says that it was unhelpful and 8% described it as being very ugly. When asked what each respondent thinks about the CAPTCHA code 1% say that it was for fun, 29% say that it was to test visual capability. Finally, 70% of the respondents say it was a validation code, use for security measure, but a security measure against what? They do not know.

a) Users Knowledge affects both their Performance and
the number of attempts made in the HIP challenge. b) Users need to be directed to sites or link that will enable them learn more about HIP (more documentation on HIP as a spam protection mechanism). Thus, security needs to be a business enabler not a source of pain.

Recommendations
a) CAPTCHA developers must be determined to minimize inconvenience experienced by users to the barest minimum. b) CAPTCHA developers should have users with disability in mind as they develop the code. c) It is strongly recommended that smaller sites adopt spam filtering checks in place of CAPTCHA. d) Users must be made to understand the fact that they are to pay a price to protect the email accounts as well as others such as financial accounts which is via CAPTCHA test. e) There is a need for increase users' awareness of the purpose of the code so that substantial number of potential customers would not be driven away from a site.

Conclusion
HIP authentication is one of the most recent technical mechanisms in combating malicious automated script. It is a security measure which uses computer programs that automatically generate and grade puzzles that most people can solve without difficulty, but that current programs cannot. The literature review has served to bring to light other varieties of forms of CAPTCHA that users are not familiar with. The point is not to support the reader with deep knowledge of the various forms of HIP authentication: Implicit CAPTCHA, Quiz CAPTCHA, Spatial CAPTCHA, Speech CAPTCHA, BONGO HIP, Gimpy CAPTCHA and Sound-based CAPTCHA but rather to show how these HIP authentication are surprisingly alike in conception. They all function and mainly make use of the same techniques. In this review, various forms of HIPs authentications have been reviewed and it has become clear that the inner workings of these systems are overall significantly and is geared at combating malicious automated scripts. Evaluating HIPs has been one of most interesting aspects of the research, and users' views and observations have been made very clear. The use of HIPs have become an increasingly essential part of most web sites such as Yahoo!, many free email service providers, web blog, financial institutions and banks, along with many other organizations, are being forced to employ the HIPs techniques as a result of the enormous activities of bots. These have helped in reducing cyber crimes, stopping spam posts and automated account creation. However, a more efficient means of protecting identities and transactions is required to be implemented and the best method of providing such secure identification at this time is by employing biometric systems. Future work will be on using biometric authentication in conjunction with CAPTCHA method for detecting malicious automated attacks which will result in a more robust system.