Semi-Supervised Learning approach for Case-Based Reasoning systems

Semi-Supervised Learning approach for Case-Based Reasoning systems

Fundamentals of CBR

Strongly influenced by cognitive science, the initial concept of CBR evolved from results of several conducted studies concerning the human brain[23]. CBR is found under different definitions in literature; it is seen as: Reasoning by remembering[25], Reasoning for reminding, an approach to problem solving and learning[, and it is defined as a sub field of artificial intelligence by Bergmann et al. CBR was originally inspired by the work of Minskey and Schank in the late 1970’s[1]. Shank [29] formulated for the first time the paradigm of Case-Based Reasoning , but it was only until the end of the 1980s that the research in the field of CBR really began to take shape. Particularly with the DARPA conferences in the USA in 1988[30], before making its mark in Europe with the first European conference in 1993 at Kaiserslautern [31]), and again with the first international conference in Lisbon in 1995

Communities in CBR Case-Based

Reasoning is a methodology with roots in artificial intelligence(AI), cognitive science and knowledge engineering (Figure1.1). CBR means solving problems based on previous experiences, remembering past situations/cases to guide the solution of a new problem. It is a study that concerns intelligent decisions based on past experiences, a field in which memory models are studied and categorized. Figure 1.1 – Communities of CBR (knowledge engineering at the intersection of AI and cognitive science)[1]. 1. Artificial intelligence(AI) it is a research field of science and engineering for developing intelligent systems. AI searches for ways to endow computer programs with intellectual capacities comparable to those of human being[33], 2. Cognitive science it can be defined as the scientific study of minds and brains and their processes, be they real, animal or artificial thought, and more generally of any cognitive system[34], 3. Knowledge engineeringis the field that corresponds to the study of concepts, methods and techniques that allow the developing of knowledge based systems in any application domain, in order to help humans to carry out tasks with little or no prior formalization[35]. In brief, given a case to solve, case-based reasoning includes the following steps[36]: • Retrieve relevant cases from the CB ( an appropriate indexing of the CB is required), • Select a set of best cases; • Deduct a solution, • Evaluate the solution ( to assure that poor solutions are not repeated), • Store the new solved case in the CB. CBR is used complete a range of reasoning tasks, such as classification, planning and design.However, the key to the development of a successful CBR system is to limit its scopes to a single reasoning task. Known as a lazy learning method, CBR system can be built without the necessity to learn data specifics or patterns, just by taking the data coming from a data base. 14 Chapter 1. Case-Based Reasoning (CBR) Yet, CBR demonstrated to be very useful in many real world application domains. Some of the reasons are[4]: • CBR falls under the intersection of numerous disciplines as mentioned earlier, which open the door to its adoption for diverse applications; • CBR methodology mimics the human reasoning process. Therefore, when implementing a CBR system, we are using a human paradigm in a computational framework; while benefiting from the large memory and speed supplied by a computer; • CBR does not require a complexes formalization of the problem and is able to deal with informal questions . 1.3 The case base The case base is one of the four sources of knowledge required in CBR(Figure1.2)[4]. The four knowledge containers are: The vocabulary a container dedicated to the description of problems and solutions in the domain. The similarity measure encompasses knowledge about cases, and how they are compared to each other. The case base (CB) the core of the CBR system, as it contains the set of previously solved problems. And finally, Adaptation knowledge it defines how a retrieved solution is adapted to correspond to a new problem. This combined knowledge is employed to complete the CBR process. In the next section we deal with the case base and the cases stored in it.

Case structuring

The idea of a « case » is to capture information as used in cognitive science to be used for problem-solving[37]. A case is an instance of a problem-solving process, in CBR it is generally composed of two disjointed spaces, they are the two component of the case that need to be distinguished: the problem description and the solution. The former embody the goals, constraints, initial data and task description. The last part comprehends the solution as it is, the steps to attain the solution (or trace), justification and annotation of the solution, along with alternative solutions and expectations (what’s expected to happen when obtaining the solution) [3]. Bergmann, Kolodner and Plaza [37] described the following elements of case structure: • A situation and its goal; • The solution and, sometimes, the means to obtain it; • The result of its execution; • Explanation of results; • Lessons that can be learned from the experience. Furthermore, in the case representation, the outcome of a solution can be captured, if the latter has achieved the desired outcome or not. Accordingly, a case can be represented by the following tuple , where p is the problem, s the solution and o the outcome. Yet, this is not an exhaustive description of the possible case components, other components can be considered. Two types of cases can be distinguished: source case and target case[1]. The source case is the one in which the «problem »and «solution «parts are available. Thus, this case can be used to solve new problems. As for the target case, it is the one that carries the problem and whose solution is not available. Depending on the nature of the problem dealt with, there are several case representations. Traditional approaches classify them into three categories: 1. Textual representation, 2. Semi-structured representation (component vector), 3. Structured representation. However, structured case representation is the most widely used in the majority of the works. Thus, the case is often represented as a set of descriptors. A descriptor d is defined by a pair d=(a,v), where«a» is an attribute and «v » is its associated value. A source case is represented by a pair (srce,Sol(srce)), and the target case by the pair (target,Sol(target)), where Sol(target) is unknown and for which we would like to provide a result. As the cases are represented by a set of descriptors then[1] : • dsi (for i=1,..,n): represents the descriptors of the problem part of the source case «srce »; • dci (for i=1,..,n): represents the descriptors of the problem part of the target case «target»; • Dsi (for i=1,..,n): represents the descriptors of the solution part of the source case «Sol(srce)»; • Dci (for i=1,..,n): represents the descriptors of the solution part of the target case «Sol(target)».

Problem description

CBR is problem-centered, as it is the principal purpose of the methodology: problem-solving. The formulation of a problem is related to the context in which it is stated, hence, each problem formulation requires a different type of solution. For example: What is the price of this phone? 1. One answer could be: To expensive for us; 2. Another answer could be: 1400$. In order to find the suitable answer, it is important to know the context in which the problem is stated. For an accurate statement, the context need to be included in the problem formulation [4]. In the framework of CBR methodology, we refer to two types of problems: The problems in the CB, registered as experiences. These cases are candidate cases for reuse. However, the CBR process is triggered by a problem, a new problem that motivated the user to look for a problem-solving method. This problem is referred to as query problem, or simply, the problem. Essentially, the commonly used terms are: query instead of problem, and answer instead of solution. As discussed in the previous section, Attribute-Value pairs is the simplest and commonly used representation. A sequence of features is used to describe a problem (f1,…,fn). Let us take as an example corona virus (Covid-19) prognosis. Considering a CBR system in charge of identifying whether a person is having Covid-19 or not. For that, some information about the patient should be gathered as cases. A possible attribute representation for such scenario is presented in example. Attribute Value Age 61 Sex Female High 1,65 m Weight 60 kg Fever 98.115° F Body pain Yes Runny nose Yes Difficulty breathing No Infected Yes Another representation that could be used is Object representation , as it is very difficult to handle hundred of feature, grouping them by category can simplify the task. Object representation for the previous Covid-19 example is provided in Table 1.2 . This representation is not often used because, from a practical point of view, it can be reduced to a representation by attribute pairs. The third main type of case representation is Relationship objects, commonly visualized as a tree or graph. For this type representation, the is no homogeneous way to represent all cases, the attributes cannot be localized by their position. To identify the attribute from the root of the graph a needed use of attribute name along with the path is required. For this knowledge representation scheme we take the Breast cancer prognosis, as shown in Figure1.3 .

Table des matières

0.1 Background
0.2 Problem statement
0.2.1 Thesis Statement
0.3 Purpose of the study and Contributions
0.4 Outline of the Thesis
1 Case-Based Reasoning (CBR)
1.1 Introduction
1.2 Fundamentals of CBR
1.2.1 Communities in CBR
1.3 The case base
1.3.1 Case structuring
1.3.2 Case indexing
1.3.3 Case base organization
1.4 CBR life cycle
1.4.1 Application phase
1.4.2 Maintenance phase
1.5 Application domains of CBR
1.5.1 When to use CBR technology?
1.5.2 Typologies of applications
1.5.3 CBR in Medecine
1.6 Conclusion
2 Maintenance of the CBR system
2.1 Introduction
2.2 Learning
2.2.1 CB container
2.3 Development and maintenance of CBR system
2.3.1 Development
2.3.2 Maintenance process
2.4 Case-Base Maintenance
2.4.1 Quality criteria for CB evaluation
2.4.2 CBM policies
2.5 Related works
2.6 Conclusion
3 Machine learning techniques
3.1 Introduction
3.2 Supervised Learning
3.3 Unsupervised Learning
3.4 Semi-Supervised Learning
3.4.1 Self-training
3.4.2 Co-training
3.4.3 Transductive SVM (TSVM)
3.4.4 Graph-Based
3.5 Active Learning
3.5.1 Definition
3.5.2 Active Learning Scenarios
3.5.3 Sampling criteria
3.6 Semi-Supervised Learning in medicine
3.7 Ensemble Learning
3.7.1 Diversity
3.7.2 Ensemble learning Algorithm
3.8 Conclusion
4 Maintenance at the development stage: Active SemiSupervised Maintenance (ASSM) approach
4.1 Introduction
4.2 Proposed approach for Active Semi-Supervised Maintenance (ASSM) at the development stage of CBR
4.2.1 Sampling phase
4.2.2 Learning phase
4.2.3 Stopping criterion
4.3 Results and Discussion
4.3.1 Data sets
4.3.2 Experimental parameters (CB quality criteria)
4.3.3 Results analysis
4.4 Conclusion
5 Case Base Maintenance: Clustering Informative, Representative and Divers cases (C_IRD)
5.1 Introduction
5.2 Proposed approach: Clustering Informative, Representative and Divers cases (C_IRD)
5.2.1 Soft Clustering to target valuable cases to retain
5.2.2 Which cases should be retained and why?
5.3 Results and Discussion
5.4 Conclusion