NM-341: Schema Migration for Serverconf, Generated and Enrollment Keys Table#4065
NM-341: Schema Migration for Serverconf, Generated and Enrollment Keys Table#4065VishalDalwadi wants to merge 23 commits into
Conversation
|
Review Complete Files Reviewed: 44 By Severity:
Major KV-to-GORM migration PR with 1 critical startup-blocking bug (internal table never created), 2 high-severity auth/telemetry breaks, and several medium data integrity issues across enrollment key, JWT, and license subsystems. Files Reviewed (44 files) |
There was a problem hiding this comment.
Risk: 🔴 Critical (85/100) — 1 high finding, 8 medium · 2621 LOC across 44 files
Critical
schema/models.go: TheInternalmodel is missing fromListModels(), so__internal__table is never auto-created. Application fails to start becauselogic/serverconf.goimmediately queries it.
High Severity
main.go:SetJWTSecret()moved insideIsMasterPod()guard — worker pods in HA mode have nil JWT signing key, breaking all authenticated requests.cli/functions/enrollment_keys.go: CLI enrollment key functions broken byschema.EnrollmentKeyresponse format changes (Groups→Tags, new ID type).
Medium Severity
logic/telemetry.go:time.Parselayout/value arguments swapped — telemetry never reports after first run sinceSetLastReportedAtalways returns an error silently ignored.main.go: NaCl key pair written non-atomically to DB; partial write can silently rotate keys and invalidate all peer traffic.migrate/migrate_v1_7_0.go: Legacy NaCl private key material persists in old DB tables after migration, leaving plaintext keys accessible.controllers/enrollmentkeys.go: Duplicate-network check increateEnrollmentKeyonly validatesTags, missesNetworksfield.pro/license/license.go:base64decodehelper silently swallows decode errors during license key retrieval.controllers/network.go:CreateDefaultNetworkEnrollmentKeyreturn values ignored; network created without enrollment key on failure.schema/enrollment_keys.go:EnrollmentKey.Valuelacks database-level unique constraint after KV→GORM migration.logic/auth.go: OAuth secret TOCTOU race via GORMSaveupsert inSetOAuthSecret.
Low Severity
controllers/server.go: Stale trial-related fields remain in server status API response.controllers/hosts.go: PendingHost enrollment key JSON format mismatch during upgrade.controllers/node_test.go: RemovedNODES_TABLE_NAMEconstant still referenced in test.pro/orchestrator/node_test.go: Test does not compile due toUseKeyparameter type change.
| if err == nil { | ||
| telServerRecord = serverTelData | ||
|
|
||
| telemetryLastReportedAtValue, err := time.Parse(telemetryLastReportedAt.Value, time.RFC3339) |
There was a problem hiding this comment.
🟡 time.Parse arguments swapped — telemetry permanently disabled after first run (bug)
In logic/telemetry.go line 123, getTelemetryLastReportedAt() calls time.Parse(telemetryLastReportedAt.Value, time.RFC3339) with arguments reversed. time.Parse expects (layout, value), but the stored timestamp is passed as the layout and the RFC3339 constant as the value to parse. This always fails when a record exists. On first startup (no record), gorm.ErrRecordNotFound returns zero time gracefully. After the first telemetry write stores an RFC3339 timestamp, every subsequent call fails, and TimerCheckpoint() returns early without calling runHooks(), permanently disabling telemetry and daily hooks.
💡 Suggestion: Swap the arguments to match the time.Parse signature: time.Parse(time.RFC3339, telemetryLastReportedAt.Value).
| telemetryLastReportedAtValue, err := time.Parse(telemetryLastReportedAt.Value, time.RFC3339) | |
| telemetryLastReportedAtValue, err := time.Parse(time.RFC3339, telemetryLastReportedAt.Value) |
📋 Prompt for AI Agents
In logic/telemetry.go at line 123, swap the arguments of time.Parse. Change time.Parse(telemetryLastReportedAt.Value, time.RFC3339) to time.Parse(time.RFC3339, telemetryLastReportedAt.Value). The first argument must be the layout format string, the second the value to parse.
| logger.Log(0, "error setting server id: ", err.Error()) | ||
| } | ||
|
|
||
| logic.SetJWTSecret() |
There was a problem hiding this comment.
🟠 SetJWTSecret() only called on master pods — worker pods have nil JWT signing key in HA mode (bug)
In main.go, SetJWTSecret() was moved from an unconditional position to inside the if servercfg.IsMasterPod() block at line 148. Worker pods in HA mode will not call SetJWTSecret(), leaving the package-level jwtSecretKey variable (logic/jwts.go:22) as nil. All JWT operations (CreateJWT, CreateUserJWT, VerifyUserToken, VerifyHostToken, etc.) depend on jwtSecretKey. With a nil key, HMAC signing produces tokens inconsistent with the master pod, and verification fails. Worker pods that serve REST API or MQ endpoints will have broken authentication. The secret is stored in a shared database and all pods need to load it into the in-memory variable.
💡 Suggestion: Move logic.SetJWTSecret() back outside the IsMasterPod() block so it runs on all pods.
📋 Prompt for AI Agents
In main.go, move logic.SetJWTSecret() from inside the if servercfg.IsMasterPod() block (currently at line 148) to after the closing brace of that block (after line 155, before line 157). This ensures all pods in HA mode load the JWT secret from the shared database into the in-memory jwtSecretKey variable.
| err = mqPrivateKey.Set(db.WithContext(context.TODO())) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| return database.Insert(database.SERVER_UUID_RECORD_KEY, string(telJSON), database.SERVER_UUID_TABLE_NAME) | ||
| return mqPublicKey.Set(db.WithContext(context.TODO())) |
There was a problem hiding this comment.
🟡 setMqKeys writes NaCl key pair non-atomically — partial write can cause silent key rotation and MQ communication loss (bug)
In main.go setMqKeys(), the NaCl box key pair is written to the internal table as two independent Set() calls (lines 344 and 349) without a transaction boundary. If the first Set() succeeds but the second fails, only the private key is persisted. On next restart, the mismatch guard at line 322 detects only one key exists and regenerates a brand-new key pair, silently overwriting the valid private key via GORM Save (upsert). All existing nodes hold the old public key and can no longer decrypt server MQ messages.
💡 Suggestion: Wrap both Set() calls in a single GORM transaction so the key pair is written atomically, or generate both key values before persisting either and use a single combined upsert.
📋 Prompt for AI Agents
In main.go setMqKeys(), wrap the two Set() calls (lines 344-349) in a transaction using db.BeginTx / tx.Commit / tx.Rollback. If the public key write fails, roll back the transaction so the private key write is also undone. Alternatively, store both keys in a single record to ensure atomicity.
| record, ok = records["netmaker-id-key-pair"] | ||
| if ok { | ||
| recordData := make(map[string][]byte) | ||
| err = json.Unmarshal([]byte(record), &recordData) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| privateKeyValue, ok := recordData["private_key"] | ||
| if ok { | ||
| privateKey := &schema.Internal{ | ||
| Key: schema.InternalKey_LicenseValidationPrivateKey, | ||
| Value: base64.StdEncoding.EncodeToString(privateKeyValue), | ||
| } | ||
| err = privateKey.Set(ctx) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| } | ||
|
|
||
| publicKeyValue, ok := recordData["public_key"] | ||
| if ok { | ||
| publicKey := &schema.Internal{ | ||
| Key: schema.InternalKey_LicenseValidationPublicKey, | ||
| Value: base64.StdEncoding.EncodeToString(publicKeyValue), | ||
| } | ||
| err = publicKey.Set(ctx) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| } | ||
| } | ||
|
|
||
| return nil |
There was a problem hiding this comment.
🟡 Legacy NaCl private key material persists in old DB tables after migration (security)
migrate_v1_7_0.go copies traffic private keys and license validation key pairs from serverconf and serveruuid tables to the new internal table, but does not delete or overwrite the original records. Private NaCl key bytes remain in the old key-value tables after migration completes, creating a security risk if any future code path or maintenance script reads from those tables.
💡 Suggestion: After successfully migrating each key record, delete the original key from the legacy table. Alternatively, overwrite the value with a placeholder or zeroes before removing it.
📋 Prompt for AI Agents
In migrate/migrate_v1_7_0.go, after each successful key migration (after line 88 for license private key, after line 100 for license public key, and for the corresponding netmaker-id keys in migrateGenerated), issue a DeleteRecord call on the legacy table for the corresponding record key. Ensure the deletion is within the same transaction context so a rollback preserves the original data.
| for _, t := range req.Tags { | ||
| if _, ok := existingNetworks[t]; ok { | ||
| logic.ReturnErrorResponse(w, r, logic.FormatError(fmt.Errorf("key names must be unique"), "badrequest")) | ||
| return | ||
| } | ||
| } |
There was a problem hiding this comment.
🟡 Duplicate-network check in createEnrollmentKey only validates Tags, misses Networks (bug)
In controllers/enrollmentkeys.go lines 321-333, the createEnrollmentKey handler builds a set of existing networks from all keys, then only iterates over req.Tags checking for duplicates. Since req.Networks entries are also merged into the enrollment key's Networks field inside logic.CreateEnrollmentKey (lines 64-75 of logic/enrollmentkey.go), a duplicate in req.Networks bypasses the validation. This allows two enrollment keys to end up sharing the same network entry, undermining the uniqueness guarantee implied by the error message 'key names must be unique'.
💡 Suggestion: Add a loop over req.Networks with the same duplicate check, or build the uniqueness validation set from both Tags and Networks before iterating.
📋 Prompt for AI Agents
In controllers/enrollmentkeys.go, after the closing brace of the req.Tags loop (line 333), add an equivalent loop over req.Networks: for _, n := range req.Networks { if _, ok := existingNetworks[n]; ok { logic.ReturnErrorResponse(w, r, logic.FormatError(fmt.Errorf("key names must be unique"), "badrequest")); return } }.
| logic.ReturnErrorResponse(w, r, logic.FormatError(err, "badrequest")) | ||
| return | ||
| } | ||
| logic.CreateDefaultNetworkEnrollmentKey(network.Name) |
There was a problem hiding this comment.
🟡 CreateDefaultNetworkEnrollmentKey return values silently ignored — network created without default key on failure (bug)
The CreateDefaultNetworkEnrollmentKey function returns (*schema.EnrollmentKey, error) but both callers discard the return values: controllers/network.go:385 and migrate/migrate.go:228. If this function fails (DB unavailable, duplicate key value collision), the network is created without a default enrollment key. In a WireGuard mesh, the default enrollment key is the mechanism for hosts to auto-join a network. Without it, hosts that register receive no network memberships and auto-registration breaks. Previously the error propagated through CreateNetwork; now it's silently lost.
💡 Suggestion: Check the error return from CreateDefaultNetworkEnrollmentKey and either return an error response to the API caller or log prominently. For the migration caller, at minimum log the error.
📋 Prompt for AI Agents
In controllers/network.go at line 385, capture the error from CreateDefaultNetworkEnrollmentKey: if _, err := logic.CreateDefaultNetworkEnrollmentKey(network.Name); err != nil { logger.Log(0, ...) }. Consider returning an error to the client if the default key is mandatory. In migrate/migrate.go at line 228, log the error with slog.Error.
| type EnrollmentKey struct { | ||
| ID string `gorm:"primaryKey" json:"id"` | ||
| Name string `json:"name"` | ||
| Value string `json:"value"` | ||
| Token string `json:"token"` | ||
| Default bool `json:"default"` | ||
| Unlimited bool `json:"unlimited"` | ||
| UsesRemaining int `json:"uses_remaining"` | ||
| Expiration time.Time `json:"expiration"` | ||
| Networks datatypes.JSONSlice[string] `json:"networks"` | ||
| Tags datatypes.JSONSlice[string] `json:"tags"` | ||
| GatewayID *string `json:"gateway_id"` | ||
| AutoEgress bool `json:"auto_egress"` | ||
| AutoAssignGateway bool `json:"auto_assign_gateway"` | ||
| Type EnrollmentKeyType `json:"type"` | ||
| CreatedBy string `json:"created_by"` | ||
| CreatedAt time.Time `json:"created_at"` | ||
| UpdatedAt time.Time `json:"updated_at"` |
There was a problem hiding this comment.
🟡 EnrollmentKey.Value lacks database-level unique constraint after KV→GORM migration (bug)
The old KV model used the enrollment key Value as the table key, guaranteeing uniqueness. The new GORM schema at schema/enrollment_keys.go:21-38 uses ID (UUID) as primary key but leaves the Value column without a unique index. GetByValue() uses gorm.First() which silently returns the first matching row if duplicates exist. If the migration were re-executed or a race between concurrent CreateEnrollmentKey calls occurred, duplicate Values with different IDs could be created, making some keys invisible to lookup-by-value operations.
💡 Suggestion: Add a UNIQUE constraint on the Value column: add gorm:"uniqueIndex" to the Value field tag and create a corresponding database migration.
📋 Prompt for AI Agents
In schema/enrollment_keys.go, change the Value field tag on line 24 from json:"value" to gorm:"uniqueIndex" json:"value". Also add a database migration to create a UNIQUE index on the value column of the enrollment_keys_v1 table for already-migrated databases.
| func SetOAuthSecret(secret string) error { | ||
| oauthSecret := &schema.Internal{ | ||
| Key: schema.InternalKey_OAuthSecret, | ||
| } | ||
| record, err := FetchAuthSecret() | ||
| if err == nil { | ||
| v := valueHolder{} | ||
| json.Unmarshal([]byte(record), &v) | ||
| if v.Value != "" { | ||
| return nil | ||
| } | ||
| err := oauthSecret.Get(db.WithContext(context.TODO())) | ||
| if err != nil && !errors.Is(err, gorm.ErrRecordNotFound) { | ||
| return err | ||
| } | ||
| var b64NewValue = base64.StdEncoding.EncodeToString([]byte(secret)) | ||
| newValueHolder := valueHolder{ | ||
| Value: b64NewValue, | ||
|
|
||
| if oauthSecret.Value != "" { | ||
| return nil | ||
| } | ||
| d, _ := json.Marshal(newValueHolder) | ||
| return database.Insert(auth_key, string(d), database.GENERATED_TABLE_NAME) | ||
|
|
||
| oauthSecret.Value = base64.StdEncoding.EncodeToString([]byte(secret)) | ||
| return oauthSecret.Set(db.WithContext(context.TODO())) | ||
| } |
There was a problem hiding this comment.
🟡 OAuth secret TOCTOU race via GORM Save upsert in SetOAuthSecret (security)
SetOAuthSecret (logic/auth.go:459-474) fetches the secret to check existence, then calls Set (GORM Save upsert) if not found. Two concurrent callers can both observe ErrRecordNotFound and both upsert different random values; the last one wins. This silently invalidates OAuth user credentials hashed against the first secret. In HA (multi-pod) deployments, simultaneous startup triggers this race. GORM Save always upserts, making silent overwrite probable.
💡 Suggestion: Use a database-level atomic insert-if-not-exists primitive (e.g., INSERT ... ON CONFLICT DO NOTHING for PostgreSQL, INSERT OR IGNORE for SQLite) instead of check-then-upsert. Or wrap in a serializable transaction.
📋 Prompt for AI Agents
In logic/auth.go SetOAuthSecret, replace the Get-then-Set pattern with an atomic upsert that only inserts if no record exists. For GORM, use FirstOrCreate with Attrs to set Value only on creation, preventing overwrite of an existing secret.
| key := schema.EnrollmentKey{} | ||
| json.Unmarshal(p.EnrollmentKey, &key) |
There was a problem hiding this comment.
🟡 PendingHost enrollment key JSON format mismatch during upgrade — old format silent deserialization failure (bug)
controllers/hosts.go approvePendingHost (line 1719-1720) unmarshals PendingHost.EnrollmentKey (a raw JSON blob) into schema.EnrollmentKey. The stored JSON may be in the old models.EnrollmentKey format (with 'relay', 'groups', and differently-semantic 'tags' fields). The new schema uses 'gateway_id', no 'groups', and 'tags' for device group IDs (not network names). During an upgrade, any pending hosts with old-format JSON silently mis-deserialize, affecting subsequent keyTags processing at line 1740-1743.
💡 Suggestion: Add backward-compatible unmarshaling that handles both old and new JSON formats, or ensure pending hosts are cleared during migration.
📋 Prompt for AI Agents
In controllers/hosts.go approvePendingHost (around line 1719-1720), add migration logic to detect the old JSON format (checking for 'relay' key presence) and convert it to the new schema.EnrollmentKey format before use. Alternatively, in the v1.7.0 migration, iterate over all PendingHost records and rewrite their EnrollmentKey JSON to the new format.
Describe your changes
Provide Issue ticket number if applicable/not in title
Provide testing steps
Checklist before requesting a review