Scikit-learn Cookbook : over 50 recipes to incorporate scikit-learn into every step of the data science pipeline, from feature extraction to model building and model evaluation /

If you're a data scientist already familiar with Python but not Scikit-Learn, or are familiar with other programming languages like R and want to take the plunge with the gold standard of Python machine learning libraries, then this is the book for you.

Saved in:
Bibliographic Details
Main Author: Hauck, Trent (Author)
Format: eBook
Published: Birmingham, U.K. : Packt Publishing, 2014.
Online Access:CONNECT
Table of Contents:
  • Cover; Copyright; Credits; About the Author; About the Reviewers;; Table of Contents; Preface; Chapter 1: Premodel Workflow; Introduction; Getting sample data from external sources; Creating sample data for toy analysis; Scaling data to the standard normal; Creating binary features through thresholding; Working with categorical variables; Binarizing label features; Imputing missing values through various strategies; Using Pipelines for multiple preprocessing steps; Reducing dimensionality with PCA; Using factor analysis for decomposition
  • Kernel PCA for nonlinear dimensionality reductionUsing truncated SVD to reduce dimensionality; Decomposition to classify with DictionaryLearning; Putting it all together with Pipelines; Using Gaussian processes for regression; Defining the Gaussian process object directly; Using stochastic gradient descent for regression; Chapter 2: Working with Linear Models; Introduction; Fitting a line through data; Evaluating the linear regression model; Using ridge regression to overcome linear regression's shortfalls; Optimizing the ridge regression parameter; Using sparsity to regularize models
  • Taking a more fundamental approach to regularization with LARSUsing linear methods for classification
  • logistic regression; Directly applying Bayesian ridge regression; Using boosting to learn from errors; Chapter 3: Building Models with Distance Metrics; Introduction; Using KMeans to cluster data; Optimizing the number of centroids; Assessing cluster correctness; Using MiniBatch KMeans to handle more data; Quantizing an image with KMeans clustering; Finding the closest objects in the feature space; Probabilistic clustering with Gaussian Mixture Models; Using KMeans for outlier detection
  • Using k-NN for regressionChapter 4: Classifying Data with scikit-learn; Introduction; Doing basic classifications with Decision Trees; Tuning a Decision Tree model; Using many Decision Trees
  • random forests; Tuning a random forest model; Classifying data with Support Vector Machines; Generalizing with multiclass classification; Using LDA for classification; Working with QDA
  • a nonlinear LDA; Using Stochastic Gradient Descent for classification; Classifying documents with Naïve Bayes; Label propagation with semi-supervised learning; Chapter 5: Post-model Workflow; Introduction
  • K-fold cross validationAutomatic cross validation; Cross validation with ShuffleSplit; Stratified k-fold; Poor man's grid search; Brute force grid search; Using dummy estimators to compare results; Regression model evaluation; Feature selection; Feature selection on L1 norms; Persisting models with joblib; Index