Feature selection is a challenging problem, especially when hundreds or thousands of features are involved. Evolutionary Computation based techniques and in particular genetic algorithms, because of their ability to explore large and complex search spaces, have proven to be effective in solving such kind of problems. Though genetic algorithms binary strings provide a natural way to represent feature subsets, several different representation schemes have been proposed to improve the performance, with most of them needing to a priori set the number of features. In this paper, we propose a novel variable length representation, in which feature subsets are represented by lists of integers. We also devised a crossover operator to cope with the variable length representation. The proposed approach has been tested on several datasets and the results compared with those achieved by a standard genetic algorithm. Results of comparisons demonstrated the effectiveness of the proposed approach in improving the performance obtainable with a standard genetic algorithm when thousand of features are involved.
Variable-length representation for EC-based feature selection in high-dimensional data
Cilia N. D.;
2019-01-01
Abstract
Feature selection is a challenging problem, especially when hundreds or thousands of features are involved. Evolutionary Computation based techniques and in particular genetic algorithms, because of their ability to explore large and complex search spaces, have proven to be effective in solving such kind of problems. Though genetic algorithms binary strings provide a natural way to represent feature subsets, several different representation schemes have been proposed to improve the performance, with most of them needing to a priori set the number of features. In this paper, we propose a novel variable length representation, in which feature subsets are represented by lists of integers. We also devised a crossover operator to cope with the variable length representation. The proposed approach has been tested on several datasets and the results compared with those achieved by a standard genetic algorithm. Results of comparisons demonstrated the effectiveness of the proposed approach in improving the performance obtainable with a standard genetic algorithm when thousand of features are involved.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.