{"slug": "building-time-series-machine-learning-models-with-sktime-in-python", "title": "Building Time-Series Machine Learning Models with sktime in Python", "summary": "Sktime, a Python library for time-series machine learning, provides a scikit-learn-style API for forecasting, classification, and regression. In a tutorial, developers can build temperature forecasting models for HVAC sensors using sktime's data structures and preprocessing pipelines.", "body_md": "# Building Time-Series Machine Learning Models with sktime in Python\n\nIn this article, we’ll build time-series machine learning models in Python using sktime and explore its core data structures for forecasting workflows.\n\n## # Introduction\n\nIf you work with sensor readings, server metrics, or any data that arrives over time, you already know that standard ** scikit-learn** pipelines don't quite fit. Time series data has structure that tabular models ignore: seasonality, trend, temporal ordering, and the fact that future values depend on past ones.\n\n** sktime** is a Python library built specifically for this. It gives you a scikit-learn-style API — fit, predict, transform — but designed from the ground up for time series. You can do forecasting, classification, regression, and clustering on time series, all with a consistent interface.\n\nIn this article, you'll work through an example problem: forecasting temperature readings from an industrial HVAC sensor. You'll learn how sktime handles time series data, how to build preprocessing pipelines, how to fit forecasters, and how to evaluate them.\n\n** You can get the code on GitHub**.\n\n## # Prerequisites\n\nYou'll need [Python 3.10](https://www.python.org/downloads/release/python-3100/) or higher and a basic familiarity with pandas. Install everything you need with:\n\n```\npip install sktime pmdarima statsmodels\n```\n\nIf you'd rather have all optional dependencies in one shot, `pip install sktime[all_extras]`\n\ncovers them.\n\n## # What Makes sktime Useful\n\nIt helps to understand the problem sktime is solving. In scikit-learn, your data is a 2D table — rows are samples, columns are features. Time series data breaks this assumption because each \"row\" is actually a sequence of values over time, and the order of those values matters.\n\nThe main data containers you'll use are:\n\n| Data Type | Representation | Description |\n|---|---|---|\n| Series |\n`pd.Series` or `pd.DataFrame`\n|\nA single time series used in vanilla forecasting. |\n| Panel |\n`pd.DataFrame` with a 2-level `MultiIndex`\n|\nA collection of multiple independent time series. |\n| Hierarchical |\n`pd.DataFrame` with a 3+ level `MultiIndex`\n|\nA structured set of time series with aggregation levels across multiple dimensions. |\n\nFor the time index itself, sktime supports several time indexes: `DatetimeIndex`\n\n, `PeriodIndex`\n\n, `Int64Index`\n\n, and `RangeIndex`\n\non your pandas objects. The index must be monotonic. If you're using `DatetimeIndex`\n\n, the `freq`\n\nattribute should be set.\n\n## # Setting Up the Dataset\n\nLet's create a realistic dataset. Imagine an HVAC sensor in a factory that records temperature every hour. The readings have a daily seasonal pattern (higher during working hours), a slight upward trend due to summer, and some noise.\n\n``` python\nimport numpy as np\nimport pandas as pd\n\nnp.random.seed(42)\n\n# 90 days of hourly readings starting Jan 1, 2026\nn_hours = 90 * 24\ntimestamps = pd.date_range(start=\"2026-01-01\", periods=n_hours, freq=\"h\")\n\n# Trend: gradual 5-degree rise over 90 days\ntrend = np.linspace(0, 5, n_hours)\n\n# Daily seasonality: temperature peaks at 2pm, dips at 4am\nhour_of_day = np.arange(n_hours) % 24\ndaily_cycle = 4 * np.sin(2 * np.pi * (hour_of_day - 4) / 24)\n\n# Noise\nnoise = np.random.normal(0, 0.8, n_hours)\n\n# Base temperature around 20°C\ntemperature = 20 + trend + daily_cycle + noise\n\n# Introduce a few missing values (sensor dropout)\ndropout_indices = [300, 301, 302, 1440, 1441]\ntemperature[dropout_indices] = np.nan\n\ny = pd.Series(temperature, index=timestamps, name=\"temp_celsius\")\ny.index.freq = pd.tseries.frequencies.to_offset(\"h\")\n\nprint(y.head())\nprint(f\"\\nShape: {y.shape}\")\nprint(f\"Missing values: {y.isna().sum()}\")\nprint(f\"Index type: {type(y.index)}\")\n```\n\nOutput:\n\n```\n2026-01-01 00:00:00    16.933270\n2026-01-01 01:00:00    17.063277\n2026-01-01 02:00:00    18.522783\n2026-01-01 03:00:00    20.190095\n2026-01-01 04:00:00    19.821941\nFreq: h, Name: temp_celsius, dtype: float64\n\nShape: (2160,)\nMissing values: 5\nIndex type:\n```\n\n## # Splitting Time Series Data for Training and Testing\n\nSplitting time series data is different from tabular data — you can't shuffle rows. You must always split chronologically: train on earlier data, test on later data.\n\nsktime provides [ temporal_train_test_split](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.split.temporal_train_test_split.html) for this purpose:\n\n``` python\nfrom sktime.split import temporal_train_test_split\n\n# Hold out the last 7 days (168 hours) as the test set\ny_train, y_test = temporal_train_test_split(y, test_size=168)\n\nprint(f\"Train: {y_train.index[0]} → {y_train.index[-1]}\")\nprint(f\"Test:  {y_test.index[0]} → {y_test.index[-1]}\")\nprint(f\"Train size: {len(y_train)}, Test size: {len(y_test)}\")\n```\n\nOutput:\n\n```\nTrain: 2026-01-01 00:00:00 → 2026-03-24 23:00:00\nTest:  2026-03-25 00:00:00 → 2026-03-31 23:00:00\nTrain size: 1992, Test size: 168\n```\n\nThe function ensures the split is clean and chronological — no data leakage from the future into the training set.\n\n## # Defining the Forecasting Horizon\n\nBefore fitting any model, you need to tell sktime which time steps you want to predict. This is the [ ForecastingHorizon](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.base.ForecastingHorizon.html).\n\n``` python\nfrom sktime.forecasting.base import ForecastingHorizon\n\n# Predict 168 steps ahead (7 days of hourly data)\n# is_relative=False means we're using absolute timestamps\nfh = ForecastingHorizon(y_test.index, is_relative=False)\n\nprint(f\"Horizon length: {len(fh)}\")\nprint(f\"First forecast point: {fh[0]}\")\nprint(f\"Last forecast point:  {fh[-1]}\")\n```\n\nThis gives:\n\n```\nHorizon length: 168\nFirst forecast point: 2026-03-25 00:00:00\nLast forecast point:  2026-03-31 23:00:00\n```\n\nYou can also use [relative horizons](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.base.ForecastingHorizon.html#sktime.forecasting.base.ForecastingHorizon.is_relative) like `fh = [1, 2, 3, ..., 168]`\n\n, which means \"1 step ahead, 2 steps ahead, ...\". Absolute horizons are cleaner when you have actual timestamps you want predictions for.\n\n## # Building a Preprocessing and Forecasting Pipeline\n\nReal sensor data has missing values, seasonal patterns, and trend — you need to handle all of these before or during forecasting. sktime's [ TransformedTargetForecaster](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.compose.TransformedTargetForecaster.html) lets you chain transformations with a forecaster into a single estimator. The transformations are applied to the target series\n\n`y`\n\nbefore fitting, and automatically reversed on the way out during prediction.\n\n``` python\nfrom sktime.forecasting.exp_smoothing import ExponentialSmoothing\nfrom sktime.forecasting.compose import TransformedTargetForecaster\nfrom sktime.transformations.series.impute import Imputer\nfrom sktime.transformations.series.detrend import Deseasonalizer, Detrender\n\npipeline = TransformedTargetForecaster(\n    steps=[\n        # Step 1: Fill missing sensor readings using linear interpolation\n        (\"imputer\", Imputer(method=\"linear\")),\n        # Step 2: Remove the linear trend so the forecaster sees a stationary series\n        (\"detrender\", Detrender()),\n        # Step 3: Remove the daily seasonality (sp=24 for hourly data with 24-hour cycles)\n        (\"deseasonalizer\", Deseasonalizer(model=\"additive\", sp=24)),\n        # Step 4: Forecast the cleaned, stationary residuals\n        (\"forecaster\", ExponentialSmoothing(trend=None, seasonal=None)),\n    ]\n)\n\npipeline.fit(y_train, fh=fh)\ny_pred = pipeline.predict()\n\nprint(y_pred.head())\n```\n\nOutput:\n\n```\n2026-03-25 00:00:00    21.210066\n2026-03-25 01:00:00    21.788986\n2026-03-25 02:00:00    22.615184\n2026-03-25 03:00:00    23.688449\n2026-03-25 04:00:00    24.621127\nFreq: h, Name: temp_celsius, dtype: float64\n```\n\nHere's what each step does:\n\nfills missing values by linearly interpolating between the surrounding readings, which works well for sensor data.`Imputer(method=\"linear\")`\n\nfits a linear trend to the training series and subtracts it; on prediction it adds the trend back.`Detrender()`\n\nremoves the 24-hour cycle from the residuals;`Deseasonalizer(sp=24)`\n\n`sp`\n\nstands for seasonal period.- Finally,\nforecasts the detrended, deseasonalized residuals.`ExponentialSmoothing`\n\n- When\n`predict()`\n\nis called, all inverse transformations are applied in reverse order automatically, and you get back predictions in the original temperature scale.\n\n## # Evaluating the Forecast\n\nsktime integrates with [standard evaluation metrics](https://www.sktime.net/en/latest/api_reference/performance_metrics.html). For forecasting, mean absolute error (MAE) and mean absolute percentage error (MAPE) are common choices.\n\n```\nfrom sktime.performance_metrics.forecasting import (\n    mean_absolute_error,\n    mean_absolute_percentage_error,\n)\n\nmae = mean_absolute_error(y_test, y_pred)\nmape = mean_absolute_percentage_error(y_test, y_pred)\n\nprint(f\"MAE:  {mae:.3f} °C\")\nprint(f\"MAPE: {mape*100:.2f}%\")\n```\n\nOutput:\n\n```\nMAE:  0.584 °C\nMAPE: 2.40%\n```\n\n## # Swapping in a Different Forecaster\n\nOne of the biggest advantages of the sktime interface is that swapping the underlying algorithm requires changing just one line. Let's try an ARIMA model in place of exponential smoothing and compare.\n\n``` python\nfrom sktime.forecasting.arima import ARIMA\n\npipeline_arima = TransformedTargetForecaster(\n    steps=[\n        (\"imputer\", Imputer(method=\"linear\")),\n        (\"detrender\", Detrender()),\n        (\"deseasonalizer\", Deseasonalizer(model=\"additive\", sp=24)),\n        # ARIMA(1,1,1) on the cleaned residuals\n        (\"forecaster\", ARIMA(order=(1, 1, 1), suppress_warnings=True)),\n    ]\n)\n\npipeline_arima.fit(y_train, fh=fh)\ny_pred_arima = pipeline_arima.predict()\n\nmae_arima = mean_absolute_error(y_test, y_pred_arima)\nmape_arima = mean_absolute_percentage_error(y_test, y_pred_arima)\n\nprint(f\"ARIMA MAE:  {mae_arima:.3f} °C\")\nprint(f\"ARIMA MAPE: {mape_arima*100:.2f}%\")\n```\n\nOutput:\n\n```\nARIMA MAE:  0.586 °C\nARIMA MAPE: 2.41%\n```\n\nThe key point is that the preprocessing steps — imputation, detrending, deseasonalization — stayed identical. You only changed the final forecaster, and everything else composed cleanly around it.\n\n## # Cross-Validating Across Time\n\nHolding out a single test window can be misleading. sktime provides time series cross-validation through splitters that respect temporal ordering.\n\n[ SlidingWindowSplitter](https://www.sktime.net/en/v0.20.0/api_reference/auto_generated/sktime.forecasting.model_selection.SlidingWindowSplitter.html) uses a rolling window: the training window slides forward in time, always staying the same length.\n\n[grows the training set cumulatively as you move forward, which is more appropriate when you want to use all available history.](https://www.sktime.net/en/v0.21.0/api_reference/auto_generated/sktime.forecasting.model_selection.ExpandingWindowSplitter.html)\n\n`ExpandingWindowSplitter`\n\n``` python\nfrom sktime.split import ExpandingWindowSplitter\nfrom sktime.forecasting.model_evaluation import evaluate\n\n# Expanding window: start with 1800-hour train set, evaluate on 168-hour windows\ncv = ExpandingWindowSplitter(\n    initial_window=1800,\n    fh=list(range(1, 169)),\n    step_length=168,\n)\n\nresults = evaluate(\n    forecaster=pipeline,\n    y=y,\n    cv=cv,\n    scoring=mean_absolute_error,\n    return_data=False,\n)\n\nprint(results[[\"test__DynamicForecastingErrorMetric\", \"fit_time\"]].round(3))\nprint(f\"\\nMean CV MAE: {results['test__DynamicForecastingErrorMetric'].mean():.3f} °C\")\n```\n\nOutput:\n\n```\n   test__DynamicForecastingErrorMetric  fit_time\n0                                0.627     0.274\n1                                0.585     0.100\n\nMean CV MAE: 0.606 °C\n```\n\n[ evaluate](https://www.sktime.net/en/stable/api_reference/auto_generated/sktime.forecasting.model_evaluation.evaluate.html) returns a DataFrame with per-fold metrics and timing. The cross-validation MAE confirms that the model generalizes consistently across different time windows in the data.\n\n## # Next Steps\n\nThis article covered the core forecasting workflow in sktime, but the library extends far beyond basic prediction tasks.\n\nIt also supports [time-series classification](https://www.sktime.net/en/v0.20.0/examples/02_classification.html), [probabilistic forecasting](https://www.sktime.net/en/stable/examples/01b_forecasting_proba.html) with uncertainty estimates, training shared models across multiple related time series, adapting traditional machine learning algorithms for sequential forecasting, and [automating model selection and tuning workflows](https://www.sktime.net/en/latest/examples/03b_forecasting_transformers_pipelines_tuning.html).\n\nOne of sktime's biggest strengths is its consistent API and integration with the broader Python machine learning ecosystem, making experimentation easier for both beginners and experienced practitioners. The [sktime docs](https://www.sktime.net/en/latest/users.html) and [example notebooks](https://www.sktime.net/en/stable/examples.html) are especially well-written and are worth bookmarking if you regularly work with forecasting or temporal data problems.\n\nis a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she's working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.\n\n**Bala Priya C**", "url": "https://wpnews.pro/news/building-time-series-machine-learning-models-with-sktime-in-python", "canonical_source": "https://www.kdnuggets.com/building-time-series-machine-learning-models-with-sktime-in-python", "published_at": "2026-06-15 14:00:02+00:00", "updated_at": "2026-06-15 14:39:23.223601+00:00", "lang": "en", "topics": ["machine-learning", "developer-tools"], "entities": ["sktime", "scikit-learn", "Python", "GitHub", "pandas", "pmdarima", "statsmodels"], "alternates": {"html": "https://wpnews.pro/news/building-time-series-machine-learning-models-with-sktime-in-python", "markdown": "https://wpnews.pro/news/building-time-series-machine-learning-models-with-sktime-in-python.md", "text": "https://wpnews.pro/news/building-time-series-machine-learning-models-with-sktime-in-python.txt", "jsonld": "https://wpnews.pro/news/building-time-series-machine-learning-models-with-sktime-in-python.jsonld"}}