Fingerprinting Codes Meet Geometry: Improved Lower Bounds for Private Query Release and Adaptive Data Analysis

TheCryptocurrencyPost

4 weeks ago

Fingerprinting Codes Meet Geometry: Improved Lower Bounds for Private Query Release and Adaptive Data Analysis

Fingerprinting codes are a crucial tool for proving lower bounds in differential privacy. They have been used to prove tight lower bounds for several fundamental questions, especially in the “low accuracy” regime. Unlike reconstruction/discrepancy approaches however, they are more suited for proving worst-case lower bounds, for query sets that arise naturally from the fingerprinting codes construction. In this work, we propose a general framework for proving fingerprinting type lower bounds, that allows us to tailor the technique to the geometry of the query set.
Our approach allows us to prove several new results.

First, we show that any (sample- and population-)accurate algorithm for answering $Q$ arbitrary adaptive counting queries over a universe $X\mathcal{X}$ to accuracy $α\alpha$ needs $Ω(log⁡∣X∣⋅log⁡Qα3)\Omega(\frac{\sqrt{\log |\mathcal{X}|}\cdot \log Q}{\alpha^3})$ samples. This shows that the approaches based on differential privacy are optimal for this question, and improves significantly on the previously known lower bounds of $log⁡Qα2\frac{\log Q}{\alpha^2}$ and $min⁡(Q,log⁡∣X∣)/α2\min(\sqrt{Q}, \sqrt{\log |\mathcal{X}|})/\alpha^2$ .
Seconly, we show that any $(ε,δ)(\varepsilon,\delta)$ -DP algorithm for answering $Q$ counting queries to accuracy $α\alpha$ needs $Ω(dlog⁡(1/δ)log⁡Qεα2)\Omega\left( \frac{\sqrt{d \log(1/\delta)} \log Q}{\varepsilon \alpha^2} \right)$ samples. Our framework allows for directly proving this bound and improves by $log⁡(1/δ)\sqrt{\log(1/\delta)}$ the bound proved by Bun, Ullman and Vadhan (2013) using composition. Thirdly, we characterize the sample complexity of answering a set of random 0-1 queries under approximate differential privacy. To achieve this, we give new upper and lower bounds that combined with existing bounds allow us to complete the picture.

Figure 1: Behavior of sample complexity vs. error trade-off for $d$ random linear queries (left) and worst-case queries (right) over a universe $X\mathcal{X}$ ( $log⁡\log$ – $log⁡\log$ scale). The sample complexity for random queries is discontinuous at $α≈log⁡∣X∣d\alpha \approx \frac{\sqrt{\log |\mathcal{X}|}}{\sqrt{d}}$ . The dependence on the privacy parameters and $log⁡d\log d$ terms are suppressed for clarity.

Source link