Skip to content

Fixing Event Log file cleanup issue#36

Merged
khushbr merged 9 commits into
mainfrom
khushbr-writer-purge-fix
Jul 13, 2021
Merged

Fixing Event Log file cleanup issue#36
khushbr merged 9 commits into
mainfrom
khushbr-writer-purge-fix

Conversation

@khushbr
Copy link
Copy Markdown
Collaborator

@khushbr khushbr commented Jul 13, 2021

Is your feature request related to a problem?
Issue : #33
[performance-analyzer-rca] PR: opensearch-project/performance-analyzer-rca#27
Previous PR [now closed] : #34

Describe the solution you are proposing

  1. The solution removes the 'MetricsPurgeActivity' collector and moves the responsibility for event log file cleanup to 'EventLogQueueProcessor.' The purgeQueueAndPersist() invokes deleteFiles() every filesCleanupPeriod (default to 60s), to cleanup the older event log files and then writes the latest event log files with writeAndRotate()
  2. Removing MetricsPurgeActivity class instantiation.

Describe alternatives you've considered
Another approach was to launch a new thread and invoke 'MetricsPurgeActivity' within it. We will again run into the same issue if this thread dies, thus to keep the cleanup and write within same thread was better.

Testing
Tested by spinning up a docker container. Manually copied 100 dummy files to /dev/shm/performanceanalyzer/.
Enabled DEBUG logs to verify cleanup is working as expected

[2021-07-01T20:30:52,950][DEBUG][c.a.o.e.p.r.EventLogFileHandler] Starting to delete old writer files
[2021-07-01T20:30:52,950][DEBUG][c.a.o.e.p.r.EventLogFileHandler] Files discovered 169
[2021-07-01T20:30:52,977][DEBUG][c.a.o.e.p.r.EventLogFileHandler] '153' Old writer files cleaned up.

Metrics:

Metrics=EventLogFilesDeletionTime=27.0 millis aggr|MEAN,EventLogFilesDeletionTime=27 millis 
aggr|MAX,EventLogFilesDeletionTime=27 millis aggr|SUM,EventLogFilesDeleted=153 count 
aggr|SUM,EventLogFilesDeleted=153 count aggr|MAX

Metric
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

sruti1312 and others added 9 commits April 15, 2021 11:30
* Add latency and failure metrics for Publish Cluster State Metrics from master's perspective

* Update Javadoc

* Add new line at the end

* Empty Queue before running each test

* Addressed comments

* Metrics for disabled collector

* Fixing bug in MasterClusterStateUpdateStatsCollector

Co-authored-by: Arpita Mathur <arpitamt@amazon.com>
// In case files deletion takes longer/fails, we are okay with eventQueue reaching
// its max size (100000), post that {@link PerformanceAnalyzerMetrics#emitMetric()}
// will emit metric {@link WriterMetrics#METRICS_WRITE_ERROR} and return.
cleanup();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the cleanup is called after the files are written, the first time this thread runs, it will delete even the most recently written files because the call to cleanup will not have the lastCleanupTimeBucket and hence deleteAllFiles will be called deleting everything but the .tmp files.

Because the full cleanup is just a one time cleanup and happens at the start, let's do it in the constructor itself ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This would cause to lose the first timebucket metrics. I would address this as part of separate PR I will have for StatCollector refactoring.

@khushbr khushbr merged commit 14b300b into main Jul 13, 2021
sruti1312 pushed a commit that referenced this pull request Aug 25, 2021
sruti1312 added a commit that referenced this pull request Aug 25, 2021
* Create writer file if metrics are available (#31)

Signed-off-by: Sruti Parthiban <partsrut@amazon.com>

* Add tests to check for writer file only if metrics are present (#35)

Signed-off-by: Sruti Parthiban <partsrut@amazon.com>

* Merge pull request #36 from opensearch-project/khushbr-writer-purge-fix

Fixing Event Log file cleanup issue

* Moving deleteAllFiles() to inside scheduleExecutor()

* Fixing the Link Checker errors, updating the official documentation

* nit: Fixing spotlessJava indentation issue

* Merge pull request #37 from khushbr/feature/purge-fix

Handling purging of lingering files before scheduleExecutor start.

* Fix failing file handler test (#38)

Signed-off-by: Sruti Parthiban <partsrut@amazon.com>

* Remove dependency on main branch when running spotless. (#47)

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Updates to gradle build file (#48)

* Updates to gradle build file

Signed-off-by: Sruti Parthiban <partsrut@amazon.com>

* Add ability to specify RCA branch

Signed-off-by: Sruti Parthiban <partsrut@amazon.com>

* Fix build when opensearch_version flag is provided. (#52)

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Update the version to 1.0.1

Signed-off-by: Sruti Parthiban <partsrut@amazon.com>

Co-authored-by: Khushboo Rajput <59671881+khushbr@users.noreply.github.com>
Co-authored-by: Khushboo Rajput <khushbr@amazon.com>
Co-authored-by: Marc Handalian <handalm@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants