Code repository

Modern Statistics: A Computer Based Approach with Python
by Ron Kenett, Shelemyahu Zacks, Peter Gedeck

Publisher: Springer International Publishing; 1st edition (September 15, 2022)
ISBN-13: 978-3-031-07565-0 (hardcover)
ISBN-13: 978-3-031-07568-1 (softcover)
ISBN-13: 978-3-031-28482-3 (eBook).
Buy at Amazon, Springer

Errata: See known errata here

Modern Statistics: A Computer Based Approach with Python is a companion volume to the book Industrial Statistics: A Computer Based Approach with Python.

This part of the repository contains:

notebooks: Python code of individual chapters in Jupyter notebooks - download notebooks and data as notebooks.zip
code: Python code for solutions as plain Python files - download all as code.zip
solutions manual: Solutions_Modernstatistics.pdf: solutions of exercises
solutions: Python code for solutions in Jupyter notebooks - download all as solutions.zip
all: zip file with all files combined - download all as all.zip
datafiles: zip file with all data files - download all as data_files.zip - the mistat package gives you already access to all datafiles, you only need to download this file if you want to use it with different software

All the Python applications referred to in this book are contained in a package called mistat available for installation from the Python package index https://pypi.org/project/mistat/. The mistat packages is maintained in a GitHub repository at https://github.com/gedeck/mistat.

Teaching material

Material is available for a Biomed Data Analyst Training Program.

Try the code

You can explore the code on

Google Colab
Binder .

Installation instructions

Instructions on installing Python and required packages are here.

These Python packages are used in the code examples of Modern Statistics:

mistat (for access to data sets and additional functionality)
numpy
scipy
scikit-learn
statsmodels
pingouin
xgboost
KDEpy
networkx
scikit-fda
pgmpy
dtreeviz
svglib
pydotplus

The notebook InstallPackages.ipynb contains the pip command to install the required packages. Note that some of the packages may need to be pinned to specific versions.

If you have a problem with visualizing the decision tree or creating a network graph, follow the installation instructions for graphviz in the dtreeviz github site. On Windows, the problem is usually resolved by adding the path to the graphviz binaries to the PATH system variable.

Chapter 1: Analyzing Variability: Descriptive Statistics
Chapter 2: Probability Models and Distribution Functions
Chapter 3: Statistical Inference and Bootstrapping
Chapter 4: Variability in Several Dimensions and Regression Models
Chapter 5: Sampling for Estimation of Finite Population Quantities
Chapter 6: Time Series Analysis and Prediction
Chapter 7: Modern analytic methods: Part I
Chapter 8: Modern analytic methods: Part II

Introductory videos

Chapter 1: Analyzing Variability: Descriptive Statistics

The chapter focuses on statistical variability and on various methods of analyzing random data. Random results of experiments are illustrated with distinction between deterministic and random components of variability. The difference between accuracy and precision is explained. Frequency distributions are defined to represent random phenomena. Various characteristics of location and dispersion of frequency distributions are defined. The elements of exploratory data analysis are presented.

Chapter 2: Probability Models and Distribution Functions

The chapter provides the basics of probability theory and of the theory of distribution functions. The probability model for random sampling is discussed. This is fundamental for statistical inference discussed in Chapter 3 and sampling procedures in Chapter 5. Bayes’ theorem also presented here has important ramifications in statistical inference, including Bayesian process monitoring and Bayesian reliability presented in Chapter 3 and Chapter 9, respectively (in the Industrial Statistics book).

Chapter 3: Statistical Inference and Bootstrapping

In this chapter we introduce basic concepts and methods of statistical inference. The focus is on estimating the parameters of statistical distributions and of testing hypotheses about them. Problems of testing if certain distributions fit observed data are also considered.

Chapter 4: Variability in Several Dimensions and Regression Models

When surveys or experiments are performed, measurements are usually taken on several characteristics of the observation elements in the sample. In such cases we have multi-variate observations, and the statistical methods which are used to analyze the relationships between the values observed on different variables are called multivariate methods. In this chapter we introduce some of these methods. In particular, we focus attention on graphical methods, linear regression methods and the analysis of contingency tables. The linear regression methods explore the linear relationship between a variable of interest and a set of variables, by which we try to predict the values of the variable of interest. Contingency tables analysis studies the association between qualitative (categorical) variables, on which we cannot apply the usual regression methods.

Chapter 5: Sampling for Estimation of Finite Population Quantities

Techniques for sampling finite populations and estimating population parameters are presented. Formulas are given for the expected value and variance of the sample mean and sample variance of simple random samples with and without replacement. Stratification is studied as a method to increase the precision of estimators. Formulas for proportional and optimal allocation are provided and demonstrated with case studies. The chapter is concluded with a section on prediction models with known covariates.

Chapter 6: Time Series Analysis and Prediction

In this chapter, we present essential parts of time series analysis, with the objective of predicting or forecasting its future development. Predicting future behavior is generally more successful for stationary series, which do not change their stochastic characteristics as time proceeds. We develop and illustrate time series which are of both types, namely covariance stationary, and non-stationary.

Chapter 7 and 8: Modern analytic methods: Part I and II

Chapter 7 is a door opener to computer age statistics. It covers a range of supervised and unsupervised learning methods and demonstrates their use in various applications.

Chapter 8 includes tip of the iceberg examples with what we thought were interesting insights, not always available in standard texts. The chapter covers functional data analysis, text analytics, reinforcement learning, Bayesian networks, and causality models.