Describe the bug
The bug occurs when applying the pyscamp.abjoin_matrix function with two identical time series, for certain time series. The resulting matrix profiles have NAN values despite the original time series not having NAN values.
mplot = pyscamp.abjoin_matrix(series, series, subsequenceLength, mheight, mwidth)
To Reproduce
Here are my imports:
import sys
import os
import pandas as pd
from scipy.io import loadmat
import numpy as np
import pyscamp
import matplotlib.pyplot as plt
import matplotlib as mpl
from matplotlib.gridspec import GridSpec
Although this phenomenon is not dataset-specific, I will be using the Wind Turbine SCADA Data For Early Fault Detection dataset on Kaggle.
Here is the url: https://www.kaggle.com/datasets/azizkasimov/wind-turbine-scada-data-for-early-fault-detection|
We import the dataset.
import kagglehub
path = kagglehub.dataset_download("azizkasimov/wind-turbine-scada-data-for-early-fault-detection")
print("Path to dataset files:", path)
Now we look at the particular dataframe.
farm = "Wind Farm A"
subfolder = "datasets"
file_name = "comma_0.csv"
new_path = os.path.join(path, farm, subfolder, file_name)
df = pd.read_csv(new_path)
The following function is useful to display the matrix profiles.
def plotMatrixRaw(matrix, name):
"""
Plot matrix without any thresholding - shows raw values
"""
fig = plt.figure(figsize=(12,10))
if matrix.shape[1] == 2:
gs = GridSpec(10, 6, figure=fig)
ax2 = fig.add_subplot(gs[1:10, 1:4])
elif matrix.shape[0] == 2:
gs = GridSpec(6, 12, figure=fig)
ax2 = fig.add_subplot(gs[1:4, 2:10])
else:
gs = GridSpec(10, 12, figure=fig)
ax2 = fig.add_subplot(gs[1:10, 2:10])
if matrix.shape[0] <= 10 or matrix.shape[1] <= 10:
im = ax2.imshow(matrix, cmap="viridis", aspect="auto",
extent=[-0.5, matrix.shape[1]-0.5, matrix.shape[0]-0.5, -0.5],
interpolation='nearest')
ax2.set_xticks(np.arange(-0.5, matrix.shape[1], 1), minor=True)
ax2.grid(which='minor', color='white', linestyle='-', linewidth=1)
else:
im = ax2.imshow(matrix, cmap="viridis", aspect="auto")
plt.colorbar(im, ax=ax2, label="Matrix Values")
if matrix.shape[1] <= 10:
ax2.set_xticks(range(matrix.shape[1]))
else:
step = max(1, matrix.shape[1] // 10)
ticks = list(range(0, matrix.shape[1], step))
if ticks[-1] != matrix.shape[1] - 1:
ticks.append(matrix.shape[1] - 1)
ax2.set_xticks(ticks)
if matrix.shape[0] <= 10:
ax2.set_yticks(range(matrix.shape[0]))
else:
step = max(1, matrix.shape[0] // 10)
ticks = list(range(0, matrix.shape[0], step))
if ticks[-1] != matrix.shape[0] - 1:
ticks.append(matrix.shape[0] - 1)
ax2.set_yticks(ticks)
if matrix.shape[1] > 50:
ax2.tick_params(axis='x', rotation=45)
matrix_min = np.nanmin(matrix)
matrix_max = np.nanmax(matrix)
matrix_mean = np.nanmean(matrix)
ax2.set_title(f"Raw Matrix: {name} ({matrix.shape[0]}×{matrix.shape[1]})\n" +
f"Range: [{matrix_min:.3f}, {matrix_max:.3f}], Mean: {matrix_mean:.3f}",
fontsize=14, pad=40)
plt.subplots_adjust(top=0.8, bottom=0.25)
plt.show()
Now here is the code where we create the matrix profiles.
subsequenceLength=280
Mheight = 250
Mwidth = 250
print(col_list)
count = 1
for sensor_name in col_list:
print("Series", count, ":", sensor_name)
print("===SERIES===")
series = df[sensor_name].to_numpy()[:1000].astype(np.float32)
print("Series Shape:", series.shape)
print("NANs in Series:", np.isnan(series).any())
print("Plot of Series: ")
plt.plot(series)
plt.show()
print("===MPLOT===")
mplot = pyscamp.abjoin_matrix(series, series, subsequenceLength, mheight = 258, mwidth = 258, verbose=True)
print("NANs in Mplot:", np.isnan(mplot).any())
print("Plot of Mplot: ")
plotMatrixRaw(mplot, sensor_name + f", {subsequenceLength}")
count += 1
Results
The matrix profiles had NAN values.
Expected behavior
I expected that, given the series involved had no nan values, for the matrix profile itself to have no nan values.
System Information
- OS: Windows 11 Home
- System Architecture: x86_64
- Package and Version: pyscamp 4.0.0
Describe the bug
The bug occurs when applying the
pyscamp.abjoin_matrixfunction with two identical time series, for certain time series. The resulting matrix profiles have NAN values despite the original time series not having NAN values.mplot = pyscamp.abjoin_matrix(series, series, subsequenceLength, mheight, mwidth)To Reproduce
Here are my imports:
Although this phenomenon is not dataset-specific, I will be using the Wind Turbine SCADA Data For Early Fault Detection dataset on Kaggle.
Here is the url: https://www.kaggle.com/datasets/azizkasimov/wind-turbine-scada-data-for-early-fault-detection|
We import the dataset.
Now we look at the particular dataframe.
The following function is useful to display the matrix profiles.
Now here is the code where we create the matrix profiles.
Results
The matrix profiles had NAN values.
Expected behavior
I expected that, given the series involved had no nan values, for the matrix profile itself to have no nan values.
System Information