Skip to content

HDDS-14577. Handle missing metadata dir when updating container state#10565

Open
priyeshkaratha wants to merge 2 commits into
apache:masterfrom
priyeshkaratha:HDDS-14577
Open

HDDS-14577. Handle missing metadata dir when updating container state#10565
priyeshkaratha wants to merge 2 commits into
apache:masterfrom
priyeshkaratha:HDDS-14577

Conversation

@priyeshkaratha

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

When the container metadata scanner detects a MISSING_METADATA_DIR corruption, it attempts to persist the UNHEALTHY state by writing the .container file. However, writeToContainerFile creates a temp file inside the metadata directory itself — the very directory that is missing — causing a secondary StorageContainerException: Failed to create tmp file. This produces a spurious WARN and leaves the UNHEALTHY state un-persisted on disk; the state is lost on the next datanode restart.

The fix adds a single guard in KeyValueContainer.writeToContainerFile: if the metadata directory does not exist, recreate it with mkdirs() before attempting to create the temp file. This allows the .container file to be written and the UNHEALTHY state to survive a restart.

What is the link to the Apache JIRA

HDDS-14577

How was this patch tested?

Added new testcases

@sreejasahithi sreejasahithi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @priyeshkaratha for this patch ,overall LGTM
just a minor nit

// Simulate MISSING_METADATA_DIR: delete the entire metadata directory.
File metadataDir = new File(keyValueContainerData.getMetadataPath());
assertTrue(metadataDir.exists(), "Metadata dir should exist before corruption");
FileUtils.deleteDirectory(metadataDir);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could use MISSING_METADATA_DIR.applyTo(keyValueContainer) instead because it is already used in scanner tests (TestKeyValueContainerCheck). Reusing it keeps corruption simulation consistent and avoids duplicating delete logic.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

@errose28

Copy link
Copy Markdown
Contributor

Hi @priyeshkaratha, the container scanner is not designed to repair containers it finds issues with, so I don't think we want to implement this. The .container file has other metadata, not just the container state which will be missing in this newly created file, and I think having partial .container files floating around the system is more problematic than the original issue.

Note that the metadata scanner runs more frequently than the data scanner, so both SCM (which gets the unhealthy ICR and sends the delete) and DN would need to be restarted every time the metadata scanner runs for this container to remain indefinitely.

If it doesn't make the code too messy we could clean up the logging by not attempting to update the file if the directory does not exist, but note that this is not a common case so we should not do anything too disruptive to the main code flow to accommodate this.

@priyeshkaratha

Copy link
Copy Markdown
Contributor Author

If it doesn't make the code too messy we could clean up the logging by not attempting to update the file if the directory does not exist, but note that this is not a common case so we should not do anything too disruptive to the main code flow to accommodate this.

Thanks @errose28 for pointing out this. It makes sense. I am justing skipping the creation of temp file in this scenario and instead adding a debug log.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants