High-dimensional changepoint estimation via sparse projection [by Richard Samworth]

Changepoints are a very common feature of Big Data that arrive in the form of a data stream. In this paper, we study high-dimensional time series in which, at certain time points, the mean structure changes in a sparse subset of the coordinates. The challenge is to borrow strength across the coordinates in order to detect smaller changes than could be observed in any individual component series. We propose a two-stage procedure called inspect for estimation of the changepoints: first, we argue that a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimisation problem derived from the CUSUM transformation of the time series. We then apply an existing univariate changepoint estimation algorithm to the projected series. Our theory provides strong guarantees on both the number of estimated changepoints and the rates of convergence of their locations, and our numerical studies validate its highly competitive empirical performance for a wide range of data generating mechanisms. We call our algorithm inspect, short for informative sparse projection for estimation of changepoints; it is implemented in the R package InspectChangepoint.

Left: visualisation of the data matrix. Right: its CUSUM transformation.

Left: overlay of the projected CUSUM statistics for the three changepoints detected. Right: visualisation of thresholding; the three detected changepoints are above the threshold (dotted red line) whereas the remaining numbers are the test statistics obtained if we run the wild binary segmentation to completion without applying a termination criterion.

A brief illustration of the inspect algorithm in action is given in the Figure above. Here, we simulated a 2000 x 1000 data matrix having independent normal columns with identity covariance and with three changepoints in the mean structure at locations 500, 1000 and 1500. Changes occur in 40 coordinates, where consecutive changepoints overlap in half of their coordinates, and the squared ${\ell_2}$ norms of the vectors of mean changes were 0.4, 0.9 and 1.6 respectively.

Author: xi'an

I am a professor of Statistics at Université Paris Dauphine, France, and University of Warwick, United Kingdom, with a definitely unhealthy (but so far not fatal) fascination for mountains and (easy) climbing, in particular for Scotland in Winter, an almost-daily run, and a reading list mainly centred at fantasy books… Plus a blog that often seems to take most of my time. Not that anyone forces me to edit it...!