call-logd syncdb tenant delete regression

Description

A syncdb implementation to cleanup deleted tenants and their associated data in call-logd introduced a regression that deleted call-logd data from all tenants.


This was caused by a datatype mismatch in the comparison between uuid values in a set difference operation(“comparing apples to oranges”). One set contained uuids in string representation, while the other contained uuids in python UUID object representation. Even though some values where semantically equivalent, the difference in representation(data type) resulted in a mismatch which misclassified some uuids as corresponding to deleted tenants. Those tenants and their associated data(through database foreign key constraints) were then promptly deleted.


Some of that data(e.g. call logs) could be regenerated, but some could not(e.g. recordings).

Causes & factors

  • limited automated and manual testing during development and review(only covering the intended behavior and not considering unintended consequences)

  • missing type-level feedback during development(part of the code involved relies on opaque interfaces that leave the developer guessing for the data types involved unless time is taken to investigate beyond abstraction boundaries, in separate libraries) and missing automated typing analysis(this is a bug that could have been caught by static type analysis if some interfaces involved were properly typed)

  • limited regression testing during release phase(was the syncdb code run in test environments during release testing, and was the impact on call log data investigated?)

  • database design does not support any recoverable “soft-delete” which would limit severity of such regressions

  • syncdb implementation design relies on comparing data from another service, wazo-auth, as source of truth. If wazo-auth fails to expose all existing tenants as expected(because of API bug, permission issue, problem in formulation of request) then this can result in deleting data that shouldn’t be deleted

Resolution

Direct fix of superficial cause is trivial and involves converting data from one data type(`uuid.UUID` python type) to another(`str`) to allow for consistent “apple to apple” comparison.

Proper resolution must include proper automated test coverage that considers the regression scenario and verifies that the syncdb execution does not affect data from any other tenant than those expected to be affected.

Zendesk Ticket IDs

None

Activity

Done

Details

Priority

Assignee

Reporter

Pair

Sébastien Duthil

Fix versions

Sprint

Story Points

Zendesk Support

Created November 21, 2023 at 3:04 PM
Updated December 13, 2023 at 7:49 PM
Resolved December 8, 2023 at 10:18 AM