Enhancement of Conference Organization Using Ontology Based Information Correlation

The world today witnessed an important transmission to the virtual world across the web. After years of interaction and exchanges between people, the web becomes saturated with enormous quantity of data in various fields. Therefore, the (semi-)automatic applications become necessity to find appropriate information in a brief time. In this context, we propose a new approach of ontology based information correlation from various web resources. As validator of this approach, we introduce the conference organizer system that will be useful during the setting up of a conference. Where we benefit from semantic web technologies to extract, correlate, rank and store information, and consequently propose a ranked lists of experts and social events depending on the user requests.


Introduction
The world today has become a small village, after the emergence of the World Wide Web as a global space. The interaction and exchange between people are increased rapidly since the beginning until now, and consequently the web became saturated with enormous quantity of data in various fields. From the beginning, the scientific community benefited from the web, and now uses it mainly to activate cooperation and exchange of experiences between researchers. So we can find vast quantities of scientific information as researchers, papers, projects...
There is no doubt that this is a sign of richness, but it has become difficult to find the appropriate information within this huge amount of data. In this case, the automatic or semi-automatic applications become necessity, especially to economize a lot of effort and time. So, many automatic applications are emerged in different forms. Among these works, we mention, for example, the search engines (e.g. Google, Bing, Yahoo…) and the bibliographic services that organize scientific documents and papers (e.g.Google Scholar, DBLP, SpringerLink…).
In recent times, these applications are no longer sufficient with the massive pumping of data. Therefore, the efforts began to focus on more effective ways that empowers the machine to moves from the storage phase to understanding data. So we started seeing the evolution of applications across the Web by exploiting the Semantic Web technologies like RDF and ontologies. One of the most important works that operate in this area is the linked data project, which exploits open data and the sematic web technologies to connect the related data (http://linkeddata.org/). As part of this project, we mention RKBExplorer [19] that has gathered information from number of different types of resources in the scientific fields and present them in a unified views.
In the midst of these developments, it was necessary for the scientific community to keep abreast of all these updates. One of the main community concerns during the last 10 years is the expert finding task, which holds implicitly the profiling task. There was a need for finding appropriate experts within organizations to activate the cooperation and advises between staff, so that it manifested in a lot of works [2,3]. With the technological and interactive advancements on the Web, it became inevitable to draw upon external expertise. Therefore, several services in a larger scale have emerged [12,14].
Despite this, the problem of acquiring researchers' information and then compute their expertise, has not been fully solved with the presence of a large amount of repeated, conflicted and outdated information. There are still efforts and researches operating in this field. In the same context, and in order to get more appropriate, accurate and recent results, it was interesting to correlate the scattered information across the Web, especially with existence of repeated information for the same subjects from different sources. This is typically useful during setting up a conference, when we need to find information for relevant experts, as their profiles or lists of ranked experts depending 64 Enhancement of Conference Organization Using Ontology Based Information Correlation on their expertise, as well as finding and proposing social events and logistics for the conference. We aim to demonstrate our approach of extracting and correlating information from multiple web resources within a system that have objective to find appropriate experts as reviewers or program organizers in a specific domain, as well as propose social events for a conference in a specific location and date.
In this paper, we introduce the Framework of our system that provides the mentioned service through exploiting of the semantic Web technologies and correlated information. The rest of paper is structured as follows: in Section 2 we review the related work. The next section evaluates the previous work and describes our new approach. The section 4 gives an overview for the proposed framework and details the used techniques. Section 5 and 6 presents the system ontologies and interpret the experiment results respectively. In section 7 we evaluate experiments and propose the future works. Finally we conclude the paper in the Section 8.

Related Work
In this section, we present the previous work that address the issue of expert finding and implicitly held the researcher information extraction and profiling tasks. Since the main objective of our work is to find appropriate researcher as reviewers for conferences by extracting and correlating their scattered information across the Web. Therefore it was necessary to review several researches in this area that have been carried out in the last years.
The expert finding has been one of the major problems for researchers. In the past, most research was carried out by organizations (in a limited scale), as a solution to users' problems, who wish to getting advice or finding a specific expert to perform certain task. Generally, it is easy to find information about employees within organizations, benefiting from their own data. The problem lies in computing the expertise. It is inferred using keywords extracted from web pages, shared documents, email or instant message transcripts. Some examples of these systems are in [1,2,3,4]. Additionally, the social network of each employee was used to enhancing the expertise computing process. It is determined from the co-occurrence of names in publications or emails.
The effort and research to automating and enhancing the profiling and expert finding tasks still exist yet within organization, especially with the presence of semantic web technologies. The systems went towards improving the knowledge presentation in their databases using ontologies to create relation between properties, and RDF to present information according these ontologies. These improvements have demonstrated their positive impact on the profiling task, as in Semantic Scout project and the Semantic Web based approach to expertise finding at KPMG [5,6]. Furthermore, some problems cannot be easily resolved by the existing teams within organization, and now with the technological and interactive advancements on the Web, the external expertise became inevitable. Therefore, several works have emerged in a larger scale outside organizations, exploiting large quantity of data on the Web [7,12,14].
In this context, many type of sources are available over the Web have been used for this task. The most used source in these works is the publications. We have seen several projects benefit from publications in several ways. For instance, the VIKEF project [8] uses several collections of papers to construct the profiles of researchers participating in ISWC 2004. Also AEFS [9] uses the citations of the publications as experts' profiles to rank the experts, and EFS [10] uses the experts' publications as the materials to build their expertise, in addition they use the link structures of Wikipedia to improve the expertise.
Other systems have expanded more to benefit from more than one source, in addition to the publications. For example, Flink [11] uses web pages, emails and FOAF profiles to extract the Semantic Web researchers' social networks. Arnetminer [12,13] uses the home pages to create a semantic-based profiles for researcher community around the world and then use profiles with publications from DBLP library to compute their expertise. More examples for expert finding systems presented in the literature, as INDURE (https://www.indure.org/) for searching experts in four universities across many disciplines in the State of Indiana in the United States. It provides an expert search function and also allows users to browse expertise information with respect to the ontology of the National Research Council [14].Microsoft Academic Search covers many disciplines and also provides rankings of experts and organizations. Like ArnetMiner, Microsoft Academic Search (http://academic.research.microsoft.com) provides graph visualization of relationship information such as co-authors and citations. As more and more large-scale expert search systems emerge, efficiency is becoming an increasingly important research topic [14].

The Proposed Approach
In the mentioned work previously, the semantic web technologies have proved their efficiency in presenting the researchers information and consequently enhance the profiling and expert finding tasks. Also the systems that have benefited from more than source, obtain more comprehensive and accurate results. These factors assist the improvements but cannot fully solve the problem with the presence of a huge quantity of conflicted, outdated and repeated data.
In this context, there are still efforts and researches operating in this field. This is what motivates us to propose a new approach that would occurs an improvement in the mentioned tasks. Therefore, in order to get more appropriate, accurate and recent results, it was interesting to correlate the scattered information across the Web, especially with existence of repeated information for the same subjects from different sources. This is typically useful during setting up a conference. So, we aim to demonstrate our approach of extracting and correlating information from multiple Web resources within a system that have objective to find appropriate experts as reviewers or program organizers in a specific domain, as well as propose social events and logistics for a conference in a specific location and date. The correlation process can make a significant improvement in the profiling task as initial phase, by providing confirmed information about researchers or social events.
The second stage aim to rank researchers and social events depending to the user request, based on the resulting information from the correlation stage. Consequently, the proposed social events are easily to rank them depending on the specified place and date. In contrast, the expertise computing process to rank researchers in a specific domain is more complex. In most systems, this process is based on co-occurrence of keywords in the used sources. Furthermore, it is improved using experts propagations in their social networks [15,16,17], and so far, only co-author relationship is used in this area. The use of new information can improve the results, and that what we intend to do in our work. Therefore, additionally to the correlated information, we aim to extract high impact information as base for the expertise computing process, including new kind of relationships between researchers and other information indicating their expertise.
Recently, a new method has emerged to the expertise computing using skills ontology [18], where the used ontology present relations and hierarchy between research topics. The ontologies of other works focus on presenting researchers' information without considering their relation with research topics classified in a specific taxonomy. So, in our ontology we aim to take into consideration this point.

Scenario
In order to apply our objectives in the two phases respectively, it is interesting to implement our proposed approach into a system. The service provided by the system would help the conferences organizers to find appropriate experts in a specific domain and social events in specific date and location.
The proposed scenario to be applied through our system begin by entering information about user request across the graphical user interface, including the scientific domain, related researchers, location, date, number of participants and halls. The system receives the request, and then begins extracting information from heterogeneous web sources. It constructs on the one hand the researchers' profiles containing all relevant information that must be used in the expertise computing task. On the other hand the cities profiles containing all information concerning touristic sites, events, hotels and restaurants. All relevant profiles are used in the second phase to provide tow lists of ranked experts and social events. Finally the obtained lists will be sent to the user, when he can choose its relevant choices.

Framework
The Framework architecture through which the scenario will be applied is shown in the Figure 1. It present all principal components from the generic user interface to the Web resources and what is between them. The system is composed of two phases:  Static phase  Dynamic phase The static phase in the left side, where its operations are marked by the blue arrows, enumerated from 1 to 7. In this phase, our system performs the extraction, correlation, ranking and storing process for all cities and researchers, independently to the user request. Therefore, the seven steps performed in this phase are: 1. Candidate Detection 2. Structured information extraction 3. Information correlation 4. Data Storage 5. Semantic Extraction from Unstructured text 6. Profiles storage 7. Expertise computing On the right side appears the dynamic phase, where its operations are marked by the green arrows, enumerated from 1 to 3. In this phase, the interaction start between user and system by entering his requests as mentioned in the scenario. Then the system operates through the following three steps for searching the suitable results: 1. Receiving and analyzing the user query 2. Querying the system repository 3. Transmitting appropriate information for the user In the case of the absence of information, the system begins by another process for acquiring information. Consequently, this composition by two phases aims to gain time, through preparing information before any query.

Static Phase
The system starts by extracting information within the static phase. The extraction process is applied according two distinct ontologies: researchers' ontology and social events' ontology, which present all properties needed in our approach. The setup of the extraction process is completed by indicating the datasets containing all researchers and cities that we aim to extract their information. Therefore, about social events we indicate the principal cities for each country, and about researchers the datasets is automatically created by the candidate detector block as shown in the Figure 1 by the first arrow. It aims to categorizing publications according their topics presented by researchers' ontology, and then extracting the publications authors as candidates for each category (topic).
Secondly, a structured data extraction is performed for each candidate. LinkedIn, Foaf-search and Microsoft academic search are proposed for this task [22,23,24]. Using their API, the system acquires relevant information for each expert according the ontology properties. The same operation will be conducted by the system to extract social events' information through Facebook, Foursquare and Google places APIs [25,26,27]. In the third step, the system stores the extracted information separately in tow repositories. For this task, we use sesame triplestore, where we store information in form of RDF and querying them using SPARQL.
Now we have information about the same subjects from several sources. Switching to a new operation, the system starts correlation according several rules. In this context, we use string comparison algorithm in order to correlate between textual properties. Majority of required properties are in textual form. Furthermore, photos of researchers are very important, because users may need to ascertain the likely their seniority or familiarity before contacting them [14]. So, another algorithm of image comparison is used to compare between researchers' photos, in order to confirm their validity and thus the validity of their sources. This correlation aims to confirm the degree of confidence for any information or source, and allows removing ambiguity on the desired information with the presence of conflicted and outdated data about the same subject.
So, the correlation step produces all required information about social events and logistics, and then stored in the system's repository. However, only basic information is collected about researchers, in exception of the old researchers. So by the fifth step, the system completes the required information through a semantic extraction from researchers' homepages. These homepages are inferred from the correlation process.
In the case of lack information about researchers, the system moves to an alternative step (marked by the arrow 5') through searching for relevant homepage using a web pages classifier. We employ Support Vector Machines as classification model. The correlated information is defined as features. Entering more than researcher name as features aim to increase the classifier accuracy. After detecting homepage, a semantic extraction is performed to complete the required information. Then, by storing the extracted information, the profiling task will be achieved. In the last step, our system benefit from the stored profiles to compute the expertise of each researcher. The combination between ontology based profiling and information correlation can occur enhancement in the expert finding topic. On the one hand, we tack into consideration dependencies and relations between ontology's properties that describe the researchers' products, activities and propagation. On the other hand, we benefit from the confirmed information, resulted by the correlation process.

Dynamic Phase
The static phase produces lists of places and events and their related information, as well as lists of ranked experts in all topics presented by the researchers' ontology. All lists and profiles are stored in the system repository and have become ready to use. Here starts the dynamic process through introducing the system in service. The system receives users' requests through the generic user interface. They inter their desired location and date for the social events, plus keywords describing the desired topic or name of person activist in certain domain. Then, the query analyzer matches the keywords with all topics presented by the ontology or search for the person from the repository to find topics in which he operates.
The matched topics are showed to the user in order to choose the suitable ones. After determining topics, location and date, the query analyzer access the stored data by SPARQL query and transmitting information to the user in form of ranked results. In the absence of similarity between keywords and topics, the keywords are sent to the static phase in order to repeat operations and obtain new results.

System Ontologies
After an expanded description of framework architecture, we inter into the practical stage through implementing all described steps. Our system begins by ontology based information extraction, so its setup was by ontologies construction. The System Ontologies is composed of two ontologies: social events' ontology and researchers' ontology. In which we present concepts and terms describing the two domains. These concepts constitute the key of extraction process, and consequently extracted information will populate the system ontologies. In the same context, ontologies present relations and dependencies between concepts that have fundamental role in the information analysis and then use it in another process.

Social Events' Ontology
The social events' ontology contains all concepts needed for extracting and sorting places by types, as well as the concepts' properties used in their description. For this task, we have used protégé as open source ontology editor to construct the system ontologies. As it was shown in Figure 2, all ontology's entities from classes to data properties are specified according our desired objective and available information within sources. It is a simple ontology, in which the classes indicate places and events as principal information that we aim to find. It indicates also their related concepts like location, countries and cities. The last class is Distance Duration, which used to present distance and duration between two places. These classes are associated with each other by object properties, and attributed by data properties that will be extracted from different sources. The important point in the class hierarchy lies in the sub classes of place and event classes, where we indicate the information source.

Researchers' Ontology
The same operation was performed to construct researchers' ontology. As the expert finding task is more complicated, it is normal that the ontology will be more complex. In this context, many vocabularies are created, in order to facilitate the semantic work. These vocabularies are universally accepted and describe ontologies' concepts and properties in different domain. Among those, we benefit from two to construct our researchers' ontology.
The first one is the SWRC (Semantic Web for Research Communities) and the second is the ACM Computing Classification System (CCS). SWRC is ontology for modeling entities of research communities such as persons, organizations, publications (bibliographic metadata) and their relationships [20]. Furthermore, the CCS is taxonomy for all computing research topics [21].   It is possible to make relations between vocabularies and then integrate them into a single ontology. Therefore we have importing the two vocabularies, then selecting all classes and properties that fit with our objectives, and finally adding non-existent ones ton complete our ontology. As for social events, the Figure 3 shows all entities of researchers' ontology. On the one hand, it presents the essential entities to describe researchers' formation, affiliation and activities (e.g. publications, projects and events). On the other hand, it associates each researcher (Person in ontology) to his topics, where he activates. Finally, since the ontology population process will be performed from several sources, it is obvious that all entities are associated to their specific identities, represented by the information sources.

Experimental Results
Referring to the framework described in Figure 1 and using ontologies described in the previous section, we implements the initial steps of our conference organizer system. We first generate the ontology based information extraction process using sources' APIs. Then we use sesame as repository to store and querying information in form of RDF files. And we provide Web interface to receive users' requests and then show the relevant results. We activate this web service using Apache Tomcat as open source web service.
As mentioned in the scenario use case, our system receives user request through generic user interface, and then operates to extract, store and query researchers and social events' information separately. Therefore, in the next sub sections, we present and evaluate some results in the two areas.

Social Events
As an initial step, we perform the ontology based information extraction from Facebook and Foursquare for several cities. The obtained results are stored and available in the social events' repository. Now, user can query system through the generic user interface, by entering specific information about country, city and location from which calculate distance. In its turn, the system provides appropriate information from its repository. In case of lack information, the system begins a new extraction process to obtain the required information. The obtained results are presented in two categories: places and events. In addition, all places are sorted by their types: touristic sites, hotels and restaurants.
In order to evaluate the extracting information efficiency from different sources, we also sort the results by their sources. Figure 4 show a sample of results obtained after querying system by Barcelona, Spain and Gràcia (station) as city, country and location respectively. In addition to the place name, photo, page and source we have extracted high-impact information, in order to use it after into the correlation and analysis processes, as likes or were here counts.
Furthermore, we use Google Maps API and location of each place to extract their Geo-locations, as well as we use the location from which calculate distance (station) to calculate distance and duration to reach each place in walking or driving cases. Finally, all places are showed within Google Map ( Figure 5).

Researchers
The same operations was performed to extract, store and show the researchers' information, but with other sources, properties and repository. In this case, LinkedIn and Foaf-search are used as information sources. We have chosen LinkedIn due to its popularity as social network for people in professional occupations. Actually, the number of users has exceeded 225 million in more than 200 countries. In contrast, Foaf-search is a Friend of a friend search engine, in which we can search through 6 million interconnected persons, organizations and places in the semantic web. It collects information from multiple sources. Practically, the most suitable source with our objective was Freebase, where we can find information about 3 million person in different filed. Figure 6 below show a simple of results obtained by our system about the same person from both sources. As we can see, several properties are often available, which constitute the basic information about each person (e.g. name, photo, location, country, position…).

70
Enhancement of Conference Organization Using Ontology Based Information Correlation

Evaluation and Future Work
In this section we evaluate the results obtained through our system. As showed in the figures 4 and 6, the principal properties are collected in both domains, which constitute an important base for the next stage of information analysis and correlation. The social events' results present comprehensive information about places but not sufficient about events, especially that Foursquare doesn't produce information about events. User can organize trips to visit places, so in the future work it is interesting to search for supplement information (e.g. prices, transport and weather). Moreover, we aim to extract information from Google places as an additional source to enrich resulted information especially about events.
On the other hand, the basic researchers' information is often extracted, especially about the older researchers. This is commensurate with our objective to find appropriate experts as conference reviewers or organizers. As we can see, LinkedIn provide information about large number of researchers.
In contrast, Freebase doesn't provide sufficient number of researchers in the scientific domain. Therefore we aim to extend the extraction process to Microsoft Academic Search as additional sources, in order to obtain more comprehensive information.

Conclusion
In this paper we have presented a conference organizer system. We introduced the new proposed approach of ontology based information correlation, on which our system is based. We have described in detail the various section of system framework. Then we have begun the experiment setup by creating ontologies. Finally, we have carried out the initial experiments of our framework, in order of evaluating the obtained results and proposing the future steps.