Figure 6 illustrated an algorithm, called MinGen, which, given a table PT, a quasi-identier QI 2 PTQI, and a k-
anonymity constraint, produces a table MGT which is a k-minimal generalization of PT[QI]. It assumes that k < jPTj,
which is a necessary and su cient condition for the existence of a minimal generalized table (see Theorem 3.1).
The algorithm makes use of two basic functions: preferred, which selects a vector that correspond to a preferred
generalization according to the given specications; and generalize, which enforces generalization. The generalize
function requires as parameters a table Ti and two distance vectors v and h, and returns the table Tj, generalization
of Ti, with DVi;j = s. The third parameter h, constructed by the algorithm, represent the distance of Ti from the
private table PT provided as input, and is used to ensure that the generalization is absolute with respect to the
domains in PT. The algorithm uses also function VectorCover discussed in the previous subsection, and functions
Levelsets and binsearch, reported in the appendix, which we explain here. Given a set of vectors ordered in increasing
distance, Levelsets denes an assignment of levels to vectors as follows. Vectors are considered in the order in which
they appear in the list. Each new vector V considered is assigned to the lowest level at which there does not exists
any vector V 0 < V . Once all the vectors have been considered, vector Vu representing the lowest appear bound of
the vectors in the considered set is determined. If Vu does not belong to the set a new higher level is created to
which Vu is assigned. Moreover, the levels are examined and possibly updated as follows. Each vector V appearing
at a level l is replicated at all the levels between the lowest the highest level (this excluded) at which there exists a
vector dominated by V and the lowest level (this excluded) at which there is a vector dominating V . This \padding"
process ensures that, if we represent the order between vectors as a graph, all paths between the zero vector and all
the vectors at a given level have the same distance.
For the sake of space we omit the complete explanation of the algorithm and refer the reader to Figure 6 for
details. Here, we outline only the major steps. Step 3 generalizes each single attribute to satisfy k-anonymity
since k-anonymity of each attribute is a necessary condition for the k-anonymity of the quasi-identier. Step 6 is
the core of the algorithm. Substep 6.1 calls procedure VectorCover to retrieve the distance vectors corresponding
to generalization strategies to be evaluated. Distance vectors are returned in a list sorted according to increasing
distance. Step 6.2 calls Levelsets to retrieve set Strategies representing the assignment of levels to vectors returned
by VectorCover. Step 6.3 calls binsearch which performs a binary search on the levels and returns the lowest level
at which there exists a vector corresponding to a minimal generalization.
The correctness of the algorithm relies on the correctness of function VectorCover and on the particular way levels
are constructed in Levelsets. In particular, it can be proved that, for how levels are dened, if a vector corresponding
to a minimal generalization appears at level l than no other vectors corresponding to a minimal generalization can
exists at level lower than l. This property allows us to e ciently retrieve the interested level by performing the
binary search.
With respect to the complexity we make the following observations. Let N be the number of tuples in PT. The
loop to achieve k-anonymity at each attribute (step 3) is at most O(N jQIj). VectorCover has a best case of N
and a worst case of O(N2). The construction of strategies can be done in logarithmic time. The binary search is
performed in O(log(