HDDS-15456. Add SCM DB lookup and identify orphan(wrt SCM) and deleted-but-present containers#10547
HDDS-15456. Add SCM DB lookup and identify orphan(wrt SCM) and deleted-but-present containers#10547sreejasahithi wants to merge 3 commits into
Conversation
|
@devmadhuu could you please review this PR. |
|
cc @rnblough |
devmadhuu
left a comment
There was a problem hiding this comment.
Thanks @sreejasahithi for the patch. Largely looks good. Some review comments, Pls check.
| " MISSING_METADATA: metadata/{containerId}.container does not exist.", | ||
| " INVALID_METADATA: metadata file exists but cannot be parsed, or the container ID in the metadata", | ||
| " does not match the directory name.", | ||
| " VALID: metadata file is present and consistent with the directory." |
There was a problem hiding this comment.
Why we want to return VALID ones ? These may run into millions. In #10414 , are we returning VALID ones as well ? And as per #10414 PR, we are already detecting duplicate container dirs as well as above discrepancies mentioned in above command description.
Also this command description is not complete. As per this PR we are identifying orphan containers only right ?
There was a problem hiding this comment.
Thanks for the review, I think there may be a misunderstanding about when we compute status/size and what VALID means here.
How the command works:
This is one single command ozone debug datanode container analyze as per we discussed that reports all three inconsistency types - duplicate container directories, orphan containers, and deleted-but-present containers.
Part 1 - DN scan (done in #10414)
We scan each volume under hdds.datanode.dir and find all container directories on that DataNode. For each container we only store:
- container ID
- container dir path(s)
We build two maps:
singles: container ID -> one path (seen on only one volume/path)
duplicates: container ID -> list of paths (same ID seen in more than one place)
A container is in either singles or duplicates, not both.
At this stage we do not compute status or size for every container on the DN.
We only enrich duplicates, for each path in the duplicates map we compute:
- on-disk metadata status (VALID / MISSING_METADATA / INVALID_METADATA)
- directory size
Part 2 - SCM lookup (only when --scm-db is provided) - current patch
We take the union of container IDs from singles and duplicates, and for each ID check scm.db:
- not present in SCM -> orphan
- present but state is DELETED -> deleted-but-present
- present with any other state -> ignore (not reported)
For orphan / deleted-but-present containers:
- if the container is already in duplicates, we reuse the enriched occurrences from Part 1 (no second enrichment)
- if it is only in singles, we enrich(i.e compute status and size) that one path on demand
So status and size are computed only for:
duplicate containers, and orphan / deleted-but-present
We do not compute or print status/size for every container found on the DN. Normal containers that exist in SCM with a non-DELETED state are never enriched and never printed - so we will not return millions of VALID containers.
here each status is:
- MISSING_METADATA if metadata/{containerId}.container does not exist.
- INVALID_METADATA if the metadata file exists but cannot be parsed, or if the container ID stored in the metadata does not match the directory name container ID.
- VALID otherwise
I will update the command description so it does not sound confusing.
| if (count < 1) { | ||
| throw new IOException("Count must be an integer greater than 0."); | ||
| } | ||
| listOptions.getLimit(); //This triggers ListLimitOptions validation |
There was a problem hiding this comment.
I think this validation is already being called at printContainerOccurrenceReport . May be wrap it in an explicit validateOptions() method with a comment that says why it must run before scanning.
|
|
||
| printContainerOccurrenceReport("Number of orphan containers(wrt SCM) on this DataNode: %d%n", | ||
| enrichedOrphanContainers); | ||
| printContainerOccurrenceReport("Number of deleted but present containers on this DataNode: %d%n", |
There was a problem hiding this comment.
| printContainerOccurrenceReport("Number of deleted but present containers on this DataNode: %d%n", | |
| printContainerOccurrenceReport("Containers marked DELETED in SCM but present on disk on this DataNode: %d%n", |
| .sorted(Map.Entry.comparingByKey()) | ||
| .limit(limit) | ||
| .forEach(entry -> printContainerEntry(entry.getKey(), entry.getValue())); | ||
| } else { |
There was a problem hiding this comment.
This else block code and above stream logic seems duplicate. Pls check and refactor.
| File scmDbDir = resolveScmDbDirectory(scmDbPath); | ||
| try { | ||
| this.dbStore = DBStoreBuilder.newBuilder(conf, SCMDBDefinition.get(), scmDbDir.getName(), | ||
| scmDbDir.getParentFile().toPath()) |
There was a problem hiding this comment.
This might throw NPE if user passes just scm.db without any parent. Convert to absolute path ?
| try { | ||
| this.dbStore = DBStoreBuilder.newBuilder(conf, SCMDBDefinition.get(), scmDbDir.getName(), | ||
| scmDbDir.getParentFile().toPath()) | ||
| .setOpenReadOnly(true) |
There was a problem hiding this comment.
What if someone points to online live DB, it will fail, if another process holds the write lock.
There was a problem hiding this comment.
Agreed. it is better if --scm-db point at an offline copy of scm.db, opening read-only does not avoid RocksDB lock contention if SCM is still running. Same I think can also be seen in ozone debug ldb scan right.
What changes were proposed in this pull request?
This PR extends
ozone debug datanode container analyzewith optional SCM metadata lookup via --scm-db option.When a scm.db path is provided, the command classifies each container ID found during the on-disk DataNode scan HDDS-15455 against SCM metadata and reports:
Containers in SCM with any other lifecycle state are skipped. Without --scm-db, the command prints a hint that SCM lookup is required for orphan and deleted-but-present detection and displays only the duplicate container information.
What is the link to the Apache JIRA
HDDS-15456
How was this patch tested?
Tested in docker cluster by manually creating the inconsistencies
Sample outputs: