Skip to content

NAN Values in Matrix Profiles for NAN-free Time Series #134

@KaamilKaka

Description

@KaamilKaka

Describe the bug

The bug occurs when applying the pyscamp.abjoin_matrix function with two identical time series, for certain time series. The resulting matrix profiles have NAN values despite the original time series not having NAN values.

mplot = pyscamp.abjoin_matrix(series, series, subsequenceLength, mheight, mwidth)

To Reproduce

Here are my imports:

import sys
import os
import pandas as pd
from scipy.io import loadmat
import numpy as np
import pyscamp
import matplotlib.pyplot as plt
import matplotlib as mpl
from matplotlib.gridspec import GridSpec

Although this phenomenon is not dataset-specific, I will be using the Wind Turbine SCADA Data For Early Fault Detection dataset on Kaggle.

Here is the url: https://www.kaggle.com/datasets/azizkasimov/wind-turbine-scada-data-for-early-fault-detection|

We import the dataset.

import kagglehub
path = kagglehub.dataset_download("azizkasimov/wind-turbine-scada-data-for-early-fault-detection")
print("Path to dataset files:", path)

Now we look at the particular dataframe.

farm = "Wind Farm A"
subfolder = "datasets"
file_name = "comma_0.csv"
new_path = os.path.join(path, farm, subfolder, file_name)
df = pd.read_csv(new_path)

The following function is useful to display the matrix profiles.

def plotMatrixRaw(matrix, name):
    """
    Plot matrix without any thresholding - shows raw values
    """


    fig = plt.figure(figsize=(12,10))
    
    if matrix.shape[1] == 2: 
        gs = GridSpec(10, 6, figure=fig) 
        ax2 = fig.add_subplot(gs[1:10, 1:4]) 
    elif matrix.shape[0] == 2:   
        gs = GridSpec(6, 12, figure=fig)  
        ax2 = fig.add_subplot(gs[1:4, 2:10])  
    else: 
        gs = GridSpec(10, 12, figure=fig) 
        ax2 = fig.add_subplot(gs[1:10, 2:10])
    
    if matrix.shape[0] <= 10 or matrix.shape[1] <= 10:
        im = ax2.imshow(matrix, cmap="viridis", aspect="auto",
                       extent=[-0.5, matrix.shape[1]-0.5, matrix.shape[0]-0.5, -0.5],
                       interpolation='nearest')
        
        ax2.set_xticks(np.arange(-0.5, matrix.shape[1], 1), minor=True)
        ax2.grid(which='minor', color='white', linestyle='-', linewidth=1)
        
    else:
        im = ax2.imshow(matrix, cmap="viridis", aspect="auto")
    
    plt.colorbar(im, ax=ax2, label="Matrix Values")
    
    if matrix.shape[1] <= 10:
        ax2.set_xticks(range(matrix.shape[1]))
    else:
        step = max(1, matrix.shape[1] // 10)
        ticks = list(range(0, matrix.shape[1], step))
        if ticks[-1] != matrix.shape[1] - 1:
            ticks.append(matrix.shape[1] - 1)
        ax2.set_xticks(ticks)
    
    if matrix.shape[0] <= 10:
        ax2.set_yticks(range(matrix.shape[0]))
    else:
        step = max(1, matrix.shape[0] // 10)
        ticks = list(range(0, matrix.shape[0], step))
        if ticks[-1] != matrix.shape[0] - 1:
            ticks.append(matrix.shape[0] - 1)
        ax2.set_yticks(ticks)
    
    if matrix.shape[1] > 50:
        ax2.tick_params(axis='x', rotation=45)
    
    matrix_min = np.nanmin(matrix)
    matrix_max = np.nanmax(matrix)
    matrix_mean = np.nanmean(matrix)
    
    ax2.set_title(f"Raw Matrix: {name} ({matrix.shape[0]}×{matrix.shape[1]})\n" + 
                  f"Range: [{matrix_min:.3f}, {matrix_max:.3f}], Mean: {matrix_mean:.3f}", 
                  fontsize=14, pad=40)
    
    plt.subplots_adjust(top=0.8, bottom=0.25)
    plt.show()

Now here is the code where we create the matrix profiles.

subsequenceLength=280
Mheight = 250
Mwidth = 250


print(col_list)
count = 1
for sensor_name in col_list:
        print("Series", count, ":", sensor_name)
        
        print("===SERIES===")
        series = df[sensor_name].to_numpy()[:1000].astype(np.float32)
        print("Series Shape:", series.shape)
        print("NANs in Series:", np.isnan(series).any())
        print("Plot of Series: ")
        plt.plot(series)
        plt.show()
        
        print("===MPLOT===")
        mplot = pyscamp.abjoin_matrix(series, series, subsequenceLength, mheight = 258, mwidth = 258, verbose=True)
        print("NANs in Mplot:", np.isnan(mplot).any())
        print("Plot of Mplot: ")
        
        plotMatrixRaw(mplot, sensor_name + f", {subsequenceLength}")
        count += 1

Results
The matrix profiles had NAN values.

Image Image

Expected behavior
I expected that, given the series involved had no nan values, for the matrix profile itself to have no nan values.

System Information

  • OS: Windows 11 Home
  • System Architecture: x86_64
  • Package and Version: pyscamp 4.0.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions