Home » News » CDRC Supporting Development of Sktime

CDRC Supporting Development of Sktime

Markus Löning is a PhD student at UCL with the CDRC, and is one of the lead developers of sktime – a Python library for time series machine learning. Time series analysis is a challenging area and many existing tools do not work well with time series data. 

Solving data science problems with time series data in Python is challenging.

Why? Existing tools are not well-suited to time series tasks and do not easily integrate together. Methods in the scikit-learn package assume that data is structured in a tabular format and each row is i.i.d. — assumptions that do not hold for time series data. Packages containing time series learning modules, such as statsmodels (https://www.statsmodels.org/stable/user-guide.html#time-series-analysis), do not integrate well together. Further, many essential time series operations, such as splitting data into train and test sets across time, are not available in existing Python packages.

To address these challenges, sktime was created.

Logo of the sktime library (Github: https://github.com/alan-turing-institute/sktime)

sktime is an open-source Python toolbox for machine learning with time series. It is a community-driven project funded by the UK Economic and Social Research Council (https://esrc.ukri.org/), the Consumer Data Research Centre (https://www.cdrc.ac.uk/), and The Alan Turing Institute (https://turing.ac.uk/).

sktime extends the scikit-learn API to time series tasks. It provides the necessary algorithms and transformation tools to efficiently solve time series regression, forecasting, and classification tasks. The library includes dedicated time series learning algorithms and transformation methods not readily available in other common libraries.

sktime was designed to interoperate with scikit-learn, easily adapt algorithms for interrelated time series tasks, and build composite models. How? Many time series tasks are related. An algorithm that can solve one task can often be re-used to help solve a related one. This idea is called reduction. For example, a model for time series regression (use a series to predict an output value) can be re-used for a time series forecasting task (the predicted output value is a future value).

Mission statement: “sktime enables understandable and composable machine learning with time series. It provides scikit-learn (https://scikit-learn.org/stable/) compatible algorithms and model composition tools, supported by a clear taxonomy of learning tasks, with instructive documentation and a friendly community.”

sktime is a great example of the user community coming together to produce a understandable, compatible, standards based, open source tool to solve a specific problem. CDRC is proud to support the project through Markus’s involvement and aims to provide similar support to many other projects in the future. 

For more details, please check out this blog post at https://towardsdatascience.com/sktime-a-unified-python-library-for-time-series-machine-learning-3c103c139a55 by Alexandra Amidon (https://alexandra-amidon.medium.com/).