Hello Matlabsolutions community I'm currently working with the TreeBagger class to generate some classification tree esembles. Now I would like to know, how it decides wich features are used for splitting the data. If I create for example an esemble of tree stumps with 5000 trees and use it to classify a dataset with two features (e.g. VRQL-Value and maximum frequency), and then check which feature was selected for splitting for every single tree like this: cellArray={}; for y=1:length(Random_Forest_Model.Trees) cellArray{y}=Random_Forest_Model.Trees{y}.CutPredictor{1}; end It happens in some cases, that only one feature was selected for all 5000 trees and the other feature was selected in not a single case (i.e. cellArray looks like this: {'x2', 'x2', 'x2', ..., 'x2', }). This can also happen with multiple features: only one feature is selected, the others are ignored. Maybe important things to mention about the dataset: -One feature achieves Values from 1 to 100, the other one from about 200 to 1200 -The classes are imbalanced (class 1: 52 entries, class 2: over 300 entries) -only the greater class contains the NaNs -both features contain NaNs My question now is: how can I achieve, that the TreeBagger uses all features for classification and not only one or how can I in genreal achieve a more balanced selection of features.
John Williams answered .
2025-11-20