Data mining algorithms : explained using R /

"This book narrows down the scope of data mining by adopting a heavily modeling-oriented perspective"--

Saved in:
Bibliographic Details
Main Author: Cichosz, Pawel (Author)
Format: Electronic eBook
Language:English
Published: Chichester, West Sussex ; Malden, MA : John Wiley & Sons Inc., 2015.
Subjects:
Online Access:CONNECT
Table of Contents:
  • Machine generated contents note: pt. I Preliminaries
  • 1. Tasks
  • 1.1. Introduction
  • 1.1.1. Knowledge
  • 1.1.2. Inference
  • 1.2. Inductive learning tasks
  • 1.2.1. Domain
  • 1.2.2. Instances
  • 1.2.3. Attributes
  • 1.2.4. Target attribute
  • 1.2.5. Input attributes
  • 1.2.6. Training set
  • 1.2.7. Model
  • 1.2.8. Performance
  • 1.2.9. Generalization
  • 1.2.10. Overfitting
  • 1.2.11. Algorithms
  • 1.2.12. Inductive learning as search
  • 1.3. Classification
  • 1.3.1. Concept
  • 1.3.2. Training set
  • 1.3.3. Model
  • 1.3.4. Performance
  • 1.3.5. Generalization
  • 1.3.6. Overfitting
  • 1.3.7. Algorithms
  • 1.4. Regression
  • 1.4.1. Target function
  • 1.4.2. Training set
  • 1.4.3. Model
  • 1.4.4. Performance
  • 1.4.5. Generalization
  • 1.4.6. Overfitting
  • 1.4.7. Algorithms
  • 1.5. Clustering
  • 1.5.1. Motivation
  • 1.5.2. Training set
  • 1.5.3. Model
  • 1.5.4. Crisp vs. soft clustering
  • 1.5.5. Hierarchical clustering
  • 1.5.6. Performance
  • 1.5.7. Generalization
  • 1.5.8. Algorithms
  • 1.5.9. Descriptive vs. predictive clustering
  • 1.6. Practical issues
  • 1.6.1. Incomplete data
  • 1.6.2. Noisy data
  • 1.7. Conclusion
  • 1.8. Further readings
  • References
  • 2. Basic statistics
  • 2.1. Introduction
  • 2.2. Notational conventions
  • 2.3. Basic statistics as modeling
  • 2.4. Distribution description
  • 2.4.1. Continuous attributes
  • 2.4.2. Discrete attributes
  • 2.4.3. Confidence intervals
  • 2.4.4. m-Estimation
  • 2.5. Relationship detection
  • 2.5.1. Significance tests
  • 2.5.2. Continuous attributes
  • 2.5.3. Discrete attributes
  • 2.5.4. Mixed attributes
  • 2.5.5. Relationship detection caveats
  • 2.6. Visualization
  • 2.6.1. Boxplot
  • 2.6.2. Histogram
  • 2.6.3. Barplot
  • 2.7. Conclusion
  • 2.8. Further readings
  • References
  • pt. II Classification
  • 3. Decision trees
  • 3.1. Introduction
  • 3.2. Decision tree model
  • 3.2.1. Nodes and branches
  • 3.2.2. Leaves
  • 3.2.3. Split types
  • 3.3. Growing
  • 3.3.1. Algorithm outline
  • 3.3.2. Class distribution calculation
  • 3.3.3. Class label assignment
  • 3.3.4. Stop criteria
  • 3.3.5. Split selection
  • 3.3.6. Split application
  • 3.3.7. Complete process
  • 3.4. Pruning
  • 3.4.1. Pruning operators
  • 3.4.2. Pruning criterion
  • 3.4.3. Pruning control strategy
  • 3.4.4. Conversion to rule sets
  • 3.5. Prediction
  • 3.5.1. Class label prediction
  • 3.5.2. Class probability prediction
  • 3.6. Weighted instances
  • 3.7. Missing value handling
  • 3.7.1. Fractional instances
  • 3.7.2. Surrogate splits
  • 3.8. Conclusion
  • 3.9. Further readings
  • References
  • 4. Naive Bayes classifier
  • 4.1. Introduction
  • 4.2. Bayes rule
  • 4.3. Classification by Bayesian inference
  • 4.3.1. Conditional class probability
  • 4.3.2. Prior class probability
  • 4.3.3. Independence assumption
  • 4.3.4. Conditional attribute value probabilities
  • 4.3.5. Model construction
  • 4.3.6. Prediction
  • 4.4. Practical issues
  • 4.4.1. Zero and small probabilities
  • 4.4.2. Linear classification
  • 4.4.3. Continuous attributes
  • 4.4.4. Missing attribute values
  • 4.4.5. Reducing naivety
  • 4.5. Conclusion
  • 4.6. Further readings
  • References
  • 5. Linear classification
  • 5.1. Introduction
  • 5.2. Linear representation
  • 5.2.1. Inner representation function
  • 5.2.2. Outer representation function
  • 5.2.3. Threshold representation
  • 5.2.4. Logit representation
  • 5.3. Parameter estimation
  • 5.3.1. Delta rule
  • 5.3.2. Gradient descent
  • 5.3.3. Distance to decision boundary
  • 5.3.4. Least squares
  • 5.4. Discrete attributes
  • 5.5. Conclusion
  • 5.6. Further readings
  • References
  • 6. Misclassification costs
  • 6.1. Introduction
  • 6.2. Cost representation
  • 6.2.1. Cost matrix
  • 6.2.2. Per-class cost vector
  • 6.2.3. Instance-specific costs
  • 6.3. Incorporating misclassification costs
  • 6.3.1. Instance weighting
  • 6.3.2. Instance resampling
  • 6.3.3. Minimum-cost rule
  • 6.3.4. Instance relabeling
  • 6.4. Effects of cost incorporation
  • 6.5. Experimental procedure
  • 6.6. Conclusion
  • 6.7. Further readings
  • References
  • 7. Classification model evaluation
  • 7.1. Introduction
  • 7.1.1. Dataset performance
  • 7.1.2. Training performance
  • 7.1.3. True performance
  • 7.2. Performance measures
  • 7.2.1. Misclassification error
  • 7.2.2. Weighted misclassification error
  • 7.2.3. Mean misclassification cost
  • 7.2.4. Confusion matrix
  • 7.2.5. ROC analysis
  • 7.2.6. Probabilistic performance measures
  • 7.3. Evaluation procedures
  • 7.3.1. Model evaluation vs. modeling procedure evaluation.
  • 7.3.2. Evaluation caveats
  • 7.3.3. Hold-out
  • 7.3.4. Cross-validation
  • 7.3.5. Leave-one-out
  • 7.3.6. Bootstrapping
  • 7.3.7. Choosing the right procedure
  • 7.3.8. Evaluation procedures for temporal data
  • 7.4. Conclusion
  • 7.5. Further readings
  • References
  • pt. III Regression
  • 8. Linear regression
  • 8.1. Introduction
  • 8.2. Linear representation
  • 8.2.1. Parametric representation
  • 8.2.2. Linear representation function
  • 8.2.3. Nonlinear representation functions
  • 8.3. Parameter estimation
  • 8.3.1. Mean square error minimization
  • 8.3.2. Delta rule
  • 8.3.3. Gradient descent
  • 8.3.4. Least squares
  • 8.4. Discrete attributes
  • 8.5. Advantages of linear models
  • 8.6. Beyond linearity
  • 8.6.1. Generalized linear representation
  • 8.6.2. Enhanced representation
  • 8.6.3. Polynomial regression
  • 8.6.4. Piecewise-linear regression
  • 8.7. Conclusion
  • 8.8. Further readings
  • References
  • 9. Regression trees
  • 9.1. Introduction
  • 9.2. Regression tree model
  • 9.2.1. Nodes and branches
  • 9.2.2. Leaves
  • 9.2.3. Split types
  • 9.2.4. Piecewise-constant regression
  • 9.3. Growing
  • 9.3.1. Algorithm outline
  • 9.3.2. Target function summary statistics
  • 9.3.3. Target value assignment
  • 9.3.4. Stop criteria
  • 9.3.5. Split selection
  • 9.3.6. Split application
  • 9.3.7. Complete process
  • 9.4. Pruning
  • 9.4.1. Pruning operators
  • 9.4.2. Pruning criterion
  • 9.4.3. Pruning control strategy
  • 9.5. Prediction
  • 9.6. Weighted instances
  • 9.7. Missing value handling
  • 9.7.1. Fractional instances
  • 9.7.2. Surrogate splits
  • 9.8. Piecewise linear regression
  • 9.8.1. Growing
  • 9.8.2. Pruning
  • 9.8.3. Prediction
  • 9.9. Conclusion
  • 9.10. Further readings
  • References
  • 10. Regression model evaluation
  • 10.1. Introduction
  • 10.1.1. Dataset performance
  • 10.1.2. Training performance
  • 10.1.3. True performance
  • 10.2. Performance measures
  • 10.2.1. Residuals
  • 10.2.2. Mean absolute error
  • 10.2.3. Mean square error
  • 10.2.4. Root mean square error
  • 10.2.5. Relative absolute error
  • 10.2.6. Coefficient of determination
  • 10.2.7. Correlation
  • 10.2.8. Weighted performance measures
  • 10.2.9. Loss functions
  • 10.3. Evaluation procedures
  • 10.3.1. Hold-out
  • 10.3.2. Cross-validation
  • 10.3.3. Leave-one-out
  • 10.3.4. Bootstrapping
  • 10.3.5. Choosing the right procedure
  • 10.4. Conclusion
  • 10.5. Further readings
  • References
  • pt. IV Clustering
  • 11. (Dis)similarity measures
  • 11.1. Introduction
  • 11.2. Measuring dissimilarity and similarity
  • 11.3. Difference-based dissimilarity
  • 11.3.1. Euclidean distance
  • 11.3.2. Minkowski distance
  • 11.3.3. Manhattan distance
  • 11.3.4. Canberra distance
  • 11.3.5. Chebyshev distance
  • 11.3.6. Hamming distance
  • 11.3.7. Gower's coefficient
  • 11.3.8. Attribute weighting
  • 11.3.9. Attribute transformation
  • 11.4. Correlation-based similarity
  • 11.4.1. Discrete attributes
  • 11.4.2. Pearson's correlation similarity
  • 11.4.3. Spearman's correlation similarity
  • 11.4.4. Cosine similarity
  • 11.5. Missing attribute values
  • 11.6. Conclusion
  • 11.7. Further readings
  • References
  • 12. K-Centers clustering
  • 12.1. Introduction
  • 12.1.1. Basic principle
  • 12.1.2. (Dis)similarity measures
  • 12.2. Algorithm scheme
  • 12.2.1. Initialization
  • 12.2.2. Stop criteria
  • 12.2.3. Cluster formation
  • 12.2.4. Implicit cluster modeling
  • 12.2.5. Instantiations
  • 12.3. k-Means
  • 12.3.1. Center adjustment
  • 12.3.2. Minimizing dissimilarity to centers
  • 12.4. Beyond means
  • 12.4.1. k-Medians
  • 12.4.2. k-Medoids
  • 12.5. Beyond (fixed) k
  • 12.5.1. Multiple runs
  • 12.5.2. Adaptive k-centers
  • 12.6. Explicit cluster modeling
  • 12.7. Conclusion
  • 12.8. Further readings
  • References
  • 13. Hierarchical clustering
  • 13.1. Introduction
  • 13.1.1. Basic approaches
  • 13.1.2. (Dis)similarity measures
  • 13.2. Cluster hierarchies
  • 13.2.1. Motivation
  • 13.2.2. Model representation
  • 13.3. Agglomerative clustering
  • 13.3.1. Algorithm scheme
  • 13.3.2. Cluster linkage
  • 13.4. Divisive clustering
  • 13.4.1. Algorithm scheme
  • 13.4.2. Wrapping a flat clustering algorithm
  • 13.4.3. Stop criteria
  • 13.5. Hierarchical clustering visualization
  • 13.6. Hierarchical clustering prediction
  • 13.6.1. Cutting cluster hierarchies
  • 13.6.2. Cluster membership assignment
  • 13.7. Conclusion
  • 13.8. Further readings
  • References
  • 14. Clustering model evaluation
  • 14.1. Introduction
  • 14.1.1. Dataset performance.
  • Note continued: 14.1.2. Training performance
  • 14.1.3. True performance
  • 14.2. Per-cluster quality measures
  • 14.2.1. Diameter
  • 14.2.2. Separation
  • 14.2.3. Isolation
  • 14.2.4. Silhouette width
  • 14.2.5. Davies
  • Bouldin index
  • 14.3. Overall quality measures
  • 14.3.1. Dunn index
  • 14.3.2. Average Davies-Bouldin index
  • 14.3.3. C index
  • 14.3.4. Average silhouette width
  • 14.3.5. Loglikelihood
  • 14.4. External quality measures
  • 14.4.1. Misclassification error
  • 14.4.2. Rand index
  • 14.4.3. General relationship detection measures
  • 14.5. Using quality measures
  • 14.6. Conclusion
  • 14.7. Further readings
  • References
  • pt. V Getting Better Models
  • 15. Model ensembles
  • 15.1. Introduction
  • 15.2. Model committees
  • 15.3. Base models
  • 15.3.1. Different training sets
  • 15.3.2. Different algorithms
  • 15.3.3. Different parameter setups
  • 15.3.4. Algorithm randomization
  • 15.3.5. Base model diversity
  • 15.4. Model aggregation
  • 15.4.1. Voting/Averaging
  • 15.4.2. Probability averaging
  • 15.4.3. Weighted voting/averaging
  • 15.4.4. Using as attributes
  • 15.5. Specific ensemble modeling algorithms
  • 15.5.1. Bagging
  • 15.5.2. Stacking
  • 15.5.3. Boosting
  • 15.5.4. Random forest
  • 15.5.5. Random Naive Bayes
  • 15.6. Quality of ensemble predictions
  • 15.7. Conclusion
  • 15.8. Further readings
  • References
  • 16. Kernel methods
  • 16.1. Introduction
  • 16.2. Support vector machines
  • 16.2.1. Classification margin
  • 16.2.2. Maximum-margin hyperplane
  • 16.2.3. Primal form
  • 16.2.4. Dual form
  • 16.2.5. Soft margin
  • 16.3. Support vector regression
  • 16.3.1. Regression tube
  • 16.3.2. Primal form
  • 16.3.3. Dual form
  • 16.4. Kernel trick
  • 16.5. Kernel functions
  • 16.5.1. Linear kernel
  • 16.5.2. Polynomial kernel
  • 16.5.3. Radial kernel
  • 16.5.4. Sigmoid kernel
  • 16.6. Kernel prediction
  • 16.7. Kernel-based algorithms
  • 16.7.1. Kernel-based SVM
  • 16.7.2. Kernel-based SVR
  • 16.8. Conclusion
  • 16.9. Further readings
  • References
  • 17. Attribute transformation
  • 17.1. Introduction
  • 17.2. Attribute transformation task
  • 17.2.1. Target task
  • 17.2.2. Target attribute
  • 17.2.3. Transformed attribute
  • 17.2.4. Training set
  • 17.2.5. Modeling transformations
  • 17.2.6. Nonmodeling transformations
  • 17.3. Simple transformations
  • 17.3.1. Standardization
  • 17.3.2. Normalization
  • 17.3.3. Aggregation
  • 17.3.4. Imputation
  • 17.3.5. Binary encoding
  • 17.4. Multiclass encoding
  • 17.4.1. Encoding and decoding functions
  • 17.4.2. 1-ok-k encoding
  • 17.4.3. Error-correcting encoding
  • 17.4.4. Effects of multiclass encoding
  • 17.5. Conclusion
  • 17.6. Further readings
  • References
  • 18. Discretization
  • 18.1. Introduction
  • 18.2. Discretization task
  • 18.2.1. Motivation
  • 18.2.2. Task definition
  • 18.2.3. Discretization as modeling
  • 18.2.4. Discretization quality
  • 18.3. Unsupervised discretization
  • 18.3.1. Equal-width intervals
  • 18.3.2. Equal-frequency intervals
  • 18.3.3. Nonmodeling discretization
  • 18.4. Supervised discretization
  • 18.4.1. Pure-class discretization
  • 18.4.2. Bottom-up discretization
  • 18.4.3. Top-down discretization
  • 18.5. Effects of discretization
  • 18.6. Conclusion
  • 18.7. Further readings
  • References
  • 19. Attribute selection
  • 19.1. Introduction
  • 19.2. Attribute selection task
  • 19.2.1. Motivation
  • 19.2.2. Task definition
  • 19.2.3. Algorithms
  • 19.3. Attribute subset search
  • 19.3.1. Search task
  • 19.3.2. Initial state
  • 19.3.3. Search operators
  • 19.3.4. State selection
  • 19.3.5. Stop criteria
  • 19.4. Attribute selection filters
  • 19.4.1. Simple statistical niters
  • 19.4.2. Correlation-based filters
  • 19.4.3. Consistency-based filters
  • 19.4.4. Relief
  • 19.4.5. Random forest
  • 19.4.6. Cutoff criteria
  • 19.4.7. Filter-driven search
  • 19.5. Attribute selection wrappers
  • 19.5.1. Subset evaluation
  • 19.5.2. Wrapper attribute selection
  • 19.6. Effects of attribute selection
  • 19.7. Conclusion
  • 19.8. Further readings
  • References
  • 20. Case studies
  • 20.1. Introduction
  • 20.1.1. Datasets
  • 20.1.2. Packages
  • 20.1.3. Auxiliary functions
  • 20.2. Census income
  • 20.2.1. Data loading and preprocessing
  • 20.2.2. Default model
  • 20.2.3. Incorporating misclassification costs
  • 20.2.4. Pruning
  • 20.2.5. Attribute selection
  • 20.2.6. Final models
  • 20.3. Communities and crime
  • 20.3.1. Data loading
  • 20.3.2. Data quality
  • 20.3.3. Regression trees
  • 20.3.4. Linear models
  • 20.3.5. Attribute selection
  • 20.3.6. Piecewise-linear models
  • 20.4. Cover type
  • 20.4.1. Data loading and preprocessing
  • 20.4.2. Class imbalance
  • 20.4.3. Decision trees
  • 20.4.4. Class rebalancing
  • 20.4.5. Multiclass encoding
  • 20.4.6. Final classification models
  • 20.4.7. Clustering
  • 20.5. Conclusion
  • 20.6. Further readings
  • References
  • Closing
  • A. Notation
  • A.1. Attribute values
  • A.2. Data subsets
  • A.3. Probabilities
  • B. R packages
  • B.1. CRAN packages
  • B.2. DMR packages
  • B.3. Installing packages
  • References
  • C. Datasets.