{"title": "Privacy-preserving logistic regression", "book": "Advances in Neural Information Processing Systems", "page_first": 289, "page_last": 296, "abstract": "This paper addresses the important tradeoff between privacy and learnability, when designing algorithms for learning from private databases. First we apply an idea of Dwork et al. to design a specific privacy-preserving machine learning algorithm, logistic regression. This involves bounding the sensitivity of logistic regression, and perturbing the learned classifier with noise proportional to the sensitivity. Noting that the approach of Dwork et al. has limitations when applied to other machine learning algorithms, we then present another privacy-preserving logistic regression algorithm. The algorithm is based on solving a perturbed objective, and does not depend on the sensitivity. We prove that our algorithm preserves privacy in the model due to Dwork et al., and we provide a learning performance guarantee. Our work also reveals an interesting connection between regularization and privacy.", "full_text": "Privacy-preserving logistic regression\n\nKamalika Chaudhuri\n\nInformation Theory and Applications\nUniversity of California, San Diego\n\nkamalika@soe.ucsd.edu\n\nClaire Monteleoni\u2217\n\nCenter for Computational Learning Systems\n\nColumbia University\n\ncmontel@ccls.columbia.edu\n\nAbstract\n\nThis paper addresses the important tradeoff between privacy and learnability,\nwhen designing algorithms for learning from private databases. We focus on\nprivacy-preserving logistic regression. First we apply an idea of Dwork et al. [6]\nto design a privacy-preserving logistic regression algorithm. This involves bound-\ning the sensitivity of regularized logistic regression, and perturbing the learned\nclassi\ufb01er with noise proportional to the sensitivity.\nWe then provide a privacy-preserving regularized logistic regression algorithm\nbased on a new privacy-preserving technique: solving a perturbed optimization\nproblem. We prove that our algorithm preserves privacy in the model due to [6].\nWe provide learning guarantees for both algorithms, which are tighter for our new\nalgorithm, in cases in which one would typically apply logistic regression. Ex-\nperiments demonstrate improved learning performance of our method, versus the\nsensitivity method. Our privacy-preserving technique does not depend on the sen-\nsitivity of the function, and extends easily to a class of convex loss functions. Our\nwork also reveals an interesting connection between regularization and privacy.\n\n1 Introduction\n\nPrivacy-preserving machine learning is an emerging problem, due in part to the increased reliance on\nthe internet for day-to-day tasks such as banking, shopping, and social networking. Moreover, pri-\nvate data such as medical and \ufb01nancial records are increasingly being digitized, stored, and managed\nby independent companies. In the literature on cryptography and information security, data privacy\nde\ufb01nitions have been proposed, however designing machine learning algorithms that adhere to them\nhas not been well-explored. On the other hand, data-mining algorithms have been introduced that\naim to respect other notions of privacy that may be less formally justi\ufb01ed.\nOur goal is to bridge the gap between approaches in the cryptography and information security com-\nmunity, and those in the data-mining community. This is necessary, as there is a tradeoff between\nthe privacy of a protocol, and the learnability of functions that respect the protocol. In addition to\nthe speci\ufb01c contributions of our paper, we hope to encourage the machine learning community to\nembrace the goals of privacy-preserving machine learning, as it is still a \ufb02edgling endeavor.\nIn this work, we provide algorithms for learning in a privacy model introduced by Dwork et al. [6].\nThe \u0001-differential privacy model limits how much information an adversary can gain about a par-\nticular private value, by observing a function learned from a database containing that value, even if\nshe knows every other value in the database. An initial positive result [6] in this setting depends on\nthe sensitivity of the function to be learned, which is the maximum amount the function value can\nchange due to an arbitrary change in one input. Using this method requires bounding the sensitivity\nof the function class to be learned, and then adding noise proportional to the sensitivity. This might\nbe dif\ufb01cult for some functions that are important for machine learning.\n\n\u2217The majority of this work was done while at UC San Diego.\n\n1\n\n\fThe contributions of this paper are as follows. First we apply the sensitivity-based method of design-\ning privacy-preserving algorithms [6] to a speci\ufb01c machine learning algorithm, logistic regression.\nThen we present a second privacy-preserving logistic regression algorithm. The second algorithm is\nbased on solving a perturbed objective function, and does not depend on the sensitivity. We prove\nthat the new method is private in the \u0001-differential privacy model. We provide learning performance\nguarantees for both algorithms, which are tighter for our new algorithm, in cases in which one would\ntypically apply logistic regression. Finally, we provide experiments demonstrating superior learning\nperformance of our new method, with respect to the algorithm based on [6]. Our technique may have\nbroader applications, and we show that it can be applied to certain classes of optimization problems.\n\n1.1 Overview and related work\n\nAt the \ufb01rst glance, it may seem that anonymizing a data-set \u2013 namely, stripping it of identifying\ninformation about individuals, such as names, addresses, etc \u2013 is suf\ufb01cient to preserve privacy.\nHowever, this is problematic, because an adversary may have some auxiliary information, which\nmay even be publicly available, and which can be used to breach privacy. For more details on such\nattacks, see [12].\nTo formally address this issue, we need a de\ufb01nition of privacy which works in the presence of\nauxiliary knowledge by the adversary. The de\ufb01nition we use is due to Dwork et al. [6], and has been\nused in several applications [4, 11, 2]. We explain this de\ufb01nition and privacy model in more detail\nin Section 2.\nPrivacy and learning. The work most related to ours is [8] and [3]. [8] shows how to \ufb01nd classi\ufb01ers\nthat preserve \u0001-differential privacy; however, their algorithm takes time exponential in d for inputs\nin Rd. [3] provides a general method for publishing data-sets while preserving \u0001-differential privacy\nsuch that the answers to all queries of a certain type with low VC-dimension are approximately\ncorrect. However, their algorithm can also be computationally inef\ufb01cient.\nAdditional related work. There has been a substantial amount of work on privacy in the literature,\nspanning several communities. Much work on privacy has been done in the data-mining community\n[1, 7], [14, 10], however the privacy de\ufb01nitions used in these papers are different, and some are sus-\nceptible to attacks when the adversary has some prior information. In contrast, the privacy de\ufb01nition\nwe use avoids these attacks, and is very strong.\n\n2 Sensitivity and the \u0001-differential privacy model\n\nBefore we de\ufb01ne the privacy model that we study, we will note a few preliminary points. Both in\nthat model, and for our algorithm and analyses, we assume that each value in the database is a real\nvector with norm at most one. That is, a database contains values x1, . . . , xn, where xi \u2208 Rd,\nand (cid:107)xi(cid:107) \u2264 1 for all i \u2208 {1, . . . , n}. This assumption is used in the privacy model. In addition,\nwe assume that when learning linear separators, the best separator passes through the origin. Note\nthat this is not an assumption that the data is separable, but instead an assumption that a vector\u2019s\nclassi\ufb01cation is based on its angle, regardless of its norm.\nIn both privacy-preserving logistic regression algorithms that we state, the output is a parameter\nvector, w, which makes prediction SGN(w \u00b7 x), on a point x. For a vector x, we use ||x|| to denote\nits Euclidean norm. For a function G(x) de\ufb01ned on Rd, we use \u2207G to denote its gradient and \u22072G\nto denote its Hessian.\n\nPrivacy De\ufb01nition. The privacy de\ufb01nition we use is due to Dwork et al. [6, 5]. In this model, users\nhave access to private data about individuals through a sanitization mechanism, usually denoted by\nM. The interaction between the sanitization mechanism and an adversary is modelled as a sequence\nof queries, made by the adversary, and responses, made by the sanitizer. The sanitizer, which is\ntypically a randomized algorithm, is said to preserve \u0001-differential privacy, if the private value of\nany one individual in the data set does not affect the likelihood of a speci\ufb01c answer by the sanitizer\nby more than \u0001.\nMore formally, \u0001-differential privacy can be de\ufb01ned as follows.\n\n2\n\n\fDe\ufb01nition 1 A randomized mechanism M provides \u0001-differential privacy, if, for all databases D1\nand D2 which differ by at most one element, and for any t,\n\u2264 e\u0001\n\nPr[M(D1) = t]\nPr[M(D2) = t]\n\nIt was shown in [6] that if a mechanism satis\ufb01es \u0001-differential privacy, then an adversary who knows\nthe private value of all the individuals in the data-set, except for one single individual, cannot \ufb01gure\nout the private value of the unknown individual, with suf\ufb01cient con\ufb01dence, from the responses of\nthe sanitizer. \u0001-differential privacy is therefore a very strong notion of privacy.\n[6] also provides a general method for computing an approximation to any function f while preserv-\ning \u0001-differential privacy. Before we can describe their method, we need a de\ufb01nition.\n\nDe\ufb01nition 2 For any function f with n inputs, we de\ufb01ne the sensitivity S(f) as the maximum, over\nall inputs, of the difference in the value of f when one input of f is changed. That is,\n|f(x1, . . . , xn\u22121, xn = a) \u2212 f(x1, . . . , xn\u22121, xn = a(cid:48))|\n\nS(f) = max\n(a,a(cid:48))\n\n[6] shows that for any input x1, . . . , xn, releasing f(x1, . . . , xn) + \u03b7, where \u03b7 is a random variable\ndrawn from a Laplace distribution with mean 0 and standard deviation S(f )\npreserves \u0001-differential\nprivacy.\nIn [13], Nissim et al. showed that given any input x to a function, and a function f, it is suf\ufb01cient\nto draw \u03b7 from a Laplace distribution with standard deviation SS(f )\n, where SS(f) is the smoothed-\nsensitivity of f around x. Although this method sometimes requires adding a smaller amount of\nnoise to preserve privacy, in general, smoothed sensitivity of a function can be hard to compute.\n\n\u0001\n\n\u0001\n\n3 A Simple Algorithm\n\nBased on [6], one can come up with a simple algorithm for privacy-preserving logistic regression,\nwhich adds noise to the classi\ufb01er obtained by logistic regression, proportional to its sensitivity. From\nCorollary 2, the sensitivity of logistic regression is at most 2\nn\u03bb. This leads to Algorithm 1, which\nobeys the privacy guarantees in Theorem 1.\n\nAlgorithm 1:\n\n1. Compute w\u2217, the classi\ufb01er obtained by regularized logistic regression on the labelled ex-\n\namples (x1, y1), . . . , (xn, yn).\n\n2. Pick a noise vector \u03b7 according to the following density function: h(\u03b7) \u221d e\u2212 n\u0001\u03bb\n\nTo pick such a vector, we choose the norm of \u03b7 from the \u0393(d, 2\ndirection of \u03b7 uniformly at random.\n\n2 ||\u03b7||.\nn\u0001\u03bb) distribution, and the\n\n3. Output w\u2217 + \u03b7.\n\nTheorem 1 Let (x1, y1), . . . , (xn, yn) be a set of labelled points over Rd such that ||xi|| \u2264 1 for\nall i. Then, Algorithm 1 preserves \u0001-differential privacy.\n\nPROOF: The proof follows by a combination of [6], and Corollary 2, which states that the sensitivity\nof logistic regression is at most 2\n\nn\u03bb. (cid:3)\n\nLemma 1 Let G(w) and g(w) be two convex functions, which are continuous and differentiable at\nall points. If w1 = argminwG(w) and w2 = argminwG(w) + g(w), then, ||w1 \u2212 w2|| \u2264 g1\n. Here,\ng1 = maxw ||\u2207g(w)|| and G2 = minv minw vT\u22072G(w)v, for any unit vector v.\n\nG2\n\nThe main idea of the proof is to examine the gradient and the Hessian of the functions G and g\naround w1 and w2. Due to lack of space, the full proof appears in the full version of our paper.\n\nCorollary 2 Given a set of n examples x1, . . . , xn in Rd, with labels y1, . . . , yn, such that for all i,\n||xi|| \u2264 1, the sensitivity of logistic regression with regularization parameter \u03bb is at most 2\nn\u03bb .\n\n3\n\n\fn. (cid:3)\n\nPROOF: We use a triangle inequality and the fact that G2 \u2265 \u03bb and g1 \u2264 1\nLearning Performance. In order to assess the performance of Algorithm 1, we \ufb01rst try to bound\nthe performance of Algorithm 1 on the training data. To do this, we need to de\ufb01ne some notation.\nFor a classi\ufb01er w, we use L(w) to denote the expected loss of w over the data distribution, and\n\u02c6L(w) to denote the empirical average loss of w over the training data. In other words, \u02c6L(w) =\n\n(cid:80)n\ni=1 log(1 + e\u2212yiwT xi), where, (xi, yi), i = 1, . . . , n are the training examples.\n\n1\nn\nFurther, for a classi\ufb01er w, we use the notation f\u03bb(w) to denote the quantity 1\n\u02c6f\u03bb(w) to denote the quantity 1\nby the following lemma.\n\n2 \u03bb||w||2 + L(w) and\n2 \u03bb||w||2 + \u02c6L(w). Our guarantees on this algorithm can be summarized\n\nLemma 3 Given a logistic regression problem with regularization parameter \u03bb, let w1 be the classi-\n\ufb01er that minimizes \u02c6f\u03bb, and w2 be the classi\ufb01er output by Algorithm 1 respectively. Then, with prob-\nability 1\u2212 \u03b4 over the randomness in the privacy mechanism, \u02c6f\u03bb(w2) \u2264 \u02c6f\u03bb(w1) + 2d2(1+\u03bb) log2(d/\u03b4)\n.\n\n\u03bb2n2\u00012\n\nDue to lack of space, the proof is deferred to the full version.\nFrom Lemma 3, we see that performance of Algorithm 1 degrades with decreasing \u03bb, and is poor in\nparticular when \u03bb is very small. One question is, can we get a privacy-preserving approximation to\nlogistic regression, which has better performance bounds for small \u03bb? To explore this, in the next\nsection, we look at a different algorithm.\n\n4 A New Algorithm\n\nIn this section, we provide a new privacy-preserving algorithm for logistic regression. The input to\nour algorithm is a set of examples x1, . . . , xn over Rd such that ||xi|| \u2264 1 for all i, a set of labels\ny1, . . . , yn for the examples, a regularization constant \u03bb and a privacy parameter \u0001, and the output is\na vector w\u2217 in Rd. Our algorithm works as follows.\nAlgorithm 2:\n\n1. We pick a random vector b from the density function h(b) \u221d e\u2212 \u0001\n\n2||b||. To implement this,\n\u0001 ) distribution, and the direction of b uniformly at\n\nwe pick the norm of b from the \u0393(d, 2\nrandom.\n\n2. Given examples x1, . . . , xn, with labels y1, . . . , yn and a regularization constant \u03bb, we\n\ncompute w\u2217 = argminw\n\n1\n\n2 \u03bbwT w + bT w\n\nn + 1\n\nn\n\n(cid:80)n\ni=1 log(1 + e\u2212yiwT xi). Output w\u2217.\n\nWe observe that our method solves a convex programming problem very similar to the logistic\nregression convex program, and therefore it has running time similar to that of logistic regression.\nIn the sequel, we show that the output of Algorithm 2 is privacy preserving.\n\nTheorem 2 Given a set of n examples x1, . . . , xn over Rd, with labels y1, . . . , yn, where for each\ni, ||xi|| \u2264 1, the output of Algorithm 2 preserves \u0001-differential privacy.\nPROOF: Let a and a(cid:48) be any two vectors over Rd with norm at most 1, and y, y(cid:48) \u2208\n{\u22121, 1}. For any such (a, y), (a(cid:48), y(cid:48)), consider the inputs (x1, y1), . . . , (xn\u22121, yn\u22121), (a, y) and\n(x1, y1) . . . , (xn\u22121, yn\u22121), (a(cid:48), y(cid:48)). Then, for any w\u2217 output by our algorithm, there is a unique\nvalue of b that maps the input to the output. This uniqueness holds, because both the regularization\nfunction and the loss functions are differentiable everywhere.\nLet the values of b for the \ufb01rst and second cases respectively, be b1 and b2.\nSince w\u2217 is the value that minimizes both the optimization problems, the derivative of both opti-\nmization functions at w\u2217 is 0.\nThis implies that for every b1 in the \ufb01rst case, there exists a b2 in the second case such that: b1 \u2212\n1+eyw\u2217T a = b2 \u2212\n1+ey(cid:48)w\u2217T a(cid:48) \u2264 1\n\n1+ey(cid:48)w\u2217T a(cid:48) . Since ||a|| \u2264 1, ||a(cid:48)|| \u2264 1, and\n\n1+eyw\u2217T a \u2264 1, and\n\ny(cid:48)a(cid:48)\n\nya\n\n1\n\n1\n\n4\n\n\ffor any w\u2217, ||b1 \u2212 b2|| \u2264 2. Using the triangle inequality, ||b1|| \u2212 2 \u2264 ||b2|| \u2264 ||b1|| + 2. Therefore,\nfor any pair (a, y), (a(cid:48), y(cid:48)),\n\nPr[w\u2217|x1, . . . , xn\u22121, y1, . . . , yn\u22121, xn = a, yn = y]\nPr[w\u2217|x1, . . . , xn\u22121, y1, . . . , yn\u22121, xn = a(cid:48), yn = y(cid:48)]\n\n= h(b1)\nh(b2)\n\n= e\u2212 \u0001\n\n2 (||b1||\u2212||b2||)\n\nwhere h(bi) for i = 1, 2 is the density of bi. Since \u22122 \u2264 ||b1|| \u2212 ||b2|| \u2264 2, this ratio is at most e\u0001.\ntheorem follows. (cid:3)\nWe notice that the privacy guarantee for our algorithm does not depend on \u03bb; in other words, for any\nvalue of \u03bb, our algorithm is private. On the other hand, as we show in Section 5, the performance of\nour algorithm does degrade with decreasing \u03bb in the worst case, although the degradation is better\nthan that of Algorithm 1 for \u03bb < 1.\n\nOther Applications. Our algorithm for privacy-preserving logistic regression can be generalized to\nprovide privacy-preserving outputs for more general convex optimization problems, so long as the\nproblems satisfy certain conditions. These conditions can be formalized in the theorem below.\nTheorem 3 Let X = {x1, . . . , xn} be a database containing private data of individuals. Suppose\ni=1 l(w, xi),\nover w \u2208 Rd for some d, such that all of the following hold:\n\nwe would like to compute a vector w that minimizes the function F (w) = G(w) +(cid:80)n\n\n1. G(w) and l(w, xi) are differentiable everywhere, and have continuous derivatives\n2. G(w) is strongly convex and l(w, xi) are convex for all i\n3. ||\u2207wl(w, x)|| \u2264 \u03ba, for any x.\n\nLet b = B \u00b7 \u02c6b, where B is drawn from \u0393(d, 2\u03ba\n\ndimensional unit sphere. Then, computing w\u2217, where w\u2217 = argminwG(w) +(cid:80)n\n\n\u0001 ), and \u02c6b is drawn uniformly from the surface of a d-\ni=1 l(w, xi) + bT w,\n\nprovides \u0001-differential privacy.\n\n5 Learning Guarantees\n\nIn this section, we show theoretical bounds on the number of samples required by the algorithms to\nlearn a linear classi\ufb01er. For the rest of the section, we use the same notation used in Section 3.\nFirst we show that, for Algorithm 2, the values of \u02c6f\u03bb(w2) and \u02c6f\u03bb(w1) are close.\nLemma 4 Given a logistic regression problem with regularization parameter \u03bb, let w1 be the clas-\nsi\ufb01er that minimizes \u02c6f\u03bb, and w2 be the classi\ufb01er output by Algorithm 2 respectively. Then, with\nprobability 1 \u2212 \u03b4 over the randomness in the privacy mechanism, \u02c6f\u03bb(w2) \u2264 \u02c6f\u03bb(w1) + 8d2 log2(d/\u03b4)\n.\n\n\u03bbn2\u00012\n\nThe proof is in the full version of our paper. As desired, for \u03bb < 1, we have attained a tighter bound\nusing Algorithm 2, than Lemma 3 for Algorithm 1.\nNow we give a performance guarantee for Algorithm 2.\n\nTheorem 4 Let w0 be a classi\ufb01er with expected loss L over the data distribution. If the training ex-\n\u03b4 )||w0||\namples are drawn independently from the data distribution, and if n > C max(||w0||2\n, d log( d\n),\n\u0001g\u0001\nfor some constant C, then, with probability 1 \u2212 \u03b4, the classi\ufb01er output by Algorithm 2 has loss at\nmost L + \u0001g over the data distribution.\nPROOF: Let w\u2217 be the classi\ufb01er that minimizes f\u03bb(w) over the data distribution, and let w1 and w2\nbe the classi\ufb01ers that minimize \u02c6f\u03bb(w) and \u02c6f\u03bb(w) + bT w\nn over the data distribution respectively. We\ncan use an analysis as in [15] to write that:\n\n\u00012\ng\n\nL(w2) = L(w0) + (f\u03bb(w2) \u2212 f\u03bb(w\u2217)) + (f\u03bb(w\u2217) \u2212 f\u03bb(w0)) + \u03bb\n2\n\n||w0||2 \u2212 \u03bb\n2\n\n||w2||2\n\n(1)\n\n5\n\n\f\u03bbn2\u00012\n\nNotice that from Lemma 4, \u02c6f\u03bb(w2) \u2212 \u02c6f\u03bb(w1) \u2264 8d2 log2(d/\u03b4)\nthe second quantity in equation 1 as f\u03bb(w2) \u2212 f\u03bb(w\u2217) \u2264 16d2 log2(d/\u03b4)\n\u03bbn2\u00012\nw\u2217, the third quantity in Equation 1 is non-positive. If \u03bb is set to be\nin Equation 1 is at most \u0001g\nif n > C \u00b7 ||w0||d log( d\noutput by Algorithm 2 is at most L(w0) + \u0001g.\n(cid:3)\n\n2 . Now, if n > C \u00b7 ||w0||2\n\u2264 \u0001g\n, then, 16d2 log2( d\n\u03b4 )\n\u03bbn2\u00012\n\n. Using this and [16], we can bound\n\u03bbn). By de\ufb01nition of\n||w0||2 , then, the fourth quantity\n\u03bbn \u2264 \u0001g\n4 . In addition,\n\u00012\ng\n4 . In either case, the total loss of the classi\ufb01er w2\n\nfor a suitable constant C, 1\n\n+ O( 1\n\n\u0001\u0001g\n\n\u03b4 )\n\n\u0001g\n\nThe same technique can be used to analyze the sensitivity-based algorithm, using Lemma 3, which\nyields the following.\n\nTheorem 5 Let w0 be a classi\ufb01er with expected loss L over the data distribution.\nIf\nthe training examples are drawn independently from the data distribution, and if n >\nC max(||w0||2\n), for some constant C, then, with probability 1 \u2212 \u03b4, the\nclassi\ufb01er output by Algorithm 2 has loss at most L + \u0001g over the data distribution.\n\n\u03b4 )||w0||\n, d log( d\n\u0001g\u0001\n\n\u03b4 )||w0||2\n\n, d log( d\n\n3/2\n\u0001\ng\n\n\u00012\ng\n\n\u0001\n\nIt is clear that this bound is never lower than the bound for Algorithm 2. Note that for problems in\nwhich (non-private) logistic regression performs well, (cid:107)w0(cid:107) \u2265 1 if w0 has low loss, since otherwise\none can show that the loss of w0 would be lower bounded by log(1 + 1\ne ). Thus the performance\nguarantee for Algorithm 2 is signi\ufb01cantly stronger than for Algorithm 1, for problems in which one\nwould typically apply logistic regression.\n\n6 Results in Simulation\n\nSensitivity method\nNew method\nStandard LR\n\nUniform, margin=0.03 Unseparable (uniform with noise 0.2 in margin 0.1)\n0.2962\u00b10.0617\n0.1426\u00b10.1284\n0\u00b10.0016\n\n0.3257\u00b10.0536\n0.1903\u00b10.1105\n0.0530\u00b10.1105\n\nFigure 1: Test error: mean \u00b1 standard deviation over \ufb01ve folds. N=17,500.\n\nWe include some simulations that compare the two privacy-preserving methods, and demonstrate\nthat using our privacy-preserving approach to logistic regression does not degrade learning per-\nformance terribly, from that of standard logistic regression. Performance degradation is inevitable\nhowever, as in both cases, in order to address privacy concerns, we are adding noise, either to the\nlearned classi\ufb01er, or to the objective.\nIn order to obtain a clean comparison between the various logistic regression variants, we \ufb01rst ex-\nperimented with arti\ufb01cial data that is separable through the origin. Because the classi\ufb01cation of a\nvector by a linear separator through the origin depends only its angle, not its norm, we sampled the\ndata from the unit hypersphere. We used a uniform distribution on the hypersphere in 10 dimensions\nwith zero mass within a small margin (0.03) from the generating linear separator. Then we experi-\nmented on uniform data that is not linearly separable. We sampled data from the surface of the unit\nball in 10 dimensions, and labeled it with a classi\ufb01er through the origin. In the band of margin \u2264 0.1\nwith respect to the perfect classi\ufb01er, we performed random label \ufb02ipping with probability 0.2. For\nour experiments, we used convex optimization software provided by [9].\nFigure 1 gives mean and standard deviation of test error over \ufb01ve-fold cross-validation, on 17,500\npoints. In both simulations, our new method is superior to the sensitivity method, although incurs\nmore error than standard (non-private) logistic regression. For both problems, we tuned the logistic\nregression parameter, \u03bb, to minimize the test error of standard logistic regression, using \ufb01ve-fold\ncross-validation on a holdout set of 10,000 points (the tuned values are: \u03bb = 0.01 in both cases).\nFor each test error computation, the performance of each of the privacy-preserving algorithms was\nevaluated by averaging over 200 random restarts, since they are both randomized algorithms.\nIn Figure 2a)-b) we provide learning curves. We graph the test error after each increment of 1000\npoints, averaged over \ufb01ve-fold cross validation. The learning curves reveal that, not only does the\n\n6\n\n\fFigure 2: Learning curves: a) Uniform distribution, margin=0.03, b) Unseparable data.\nEpsilon curves: c) Uniform distribution, margin=0.03, d) Unseparable data.\n\nnew method reach a lower \ufb01nal error than the sensitivity method, but it also has better performance\nat most smaller training set sizes.\nIn order to observe the effect of the level of privacy on the learning performance of the privacy-\npreserving learning algorithms, in Figure 2c)-d) we vary \u0001, the privacy parameter to the two algo-\nrithms, on both the uniform, low margin data, and the unseparable data. As per the de\ufb01nition of\n\u0001-differential privacy in Section 2, strengthening the privacy guarantee corresponds to reducing \u0001.\nBoth algorithms\u2019 learning performance degrades in this direction. For the majority of values of \u0001\nthat we tested, the new method is superior in managing the tradeoff between privacy and learning\nperformance. For very small \u0001, corresponding to extremely stringent privacy requirements, the sen-\nsitivity method performs better but also has a predication accuracy close to chance, which is not\nuseful for machine learning purposes.\n\n7 Conclusion\n\nIn conclusion, we show two ways to construct a privacy-preserving linear classi\ufb01er through logistic\nregression. The \ufb01rst one uses the methods of [6], and the second one is a new algorithm. Us-\ning the \u0001-differential privacy de\ufb01nition of Dwork et al. [6], we prove that our new algorithm is\nprivacy-preserving. We provide learning performance guarantees for the two algorithms, which are\ntighter for our new algorithm, in cases in which one would typically apply logistic regression. In\nsimulations, our new algorithm outperforms the method based on [6].\nOur work reveals an interesting connection between regularization and privacy: the larger the reg-\nularization constant, the less sensitive the logistic regression function is to any one individual ex-\nample, and as a result, the less noise one needs to add to make it privacy-preserving. Therefore,\nregularization not only prevents over\ufb01tting, but also helps with privacy, by making the classi\ufb01er less\n\n7\n\n246810121400.10.20.30.40.50.60.7N/1000. Learning curve for uniform data. d=10, epsilon=0.1, margin=0.03, lambda=0.01Avg test error over 5\u2212fold cross\u2212valid. 200 random restarts. Our methodStandard LRSensitivity method24681012140.050.10.150.20.250.30.350.40.450.50.55N/1000. Learning curve for unseparable data. d=10, epsilon=0.1, lambda=0.01Avg test error over 5\u2212fold cross\u2212valid. 200 random restarts. Our methodStandard LRSensitivity method00.020.040.060.080.10.120.140.160.180.20.10.150.20.250.30.350.40.450.50.550.6Epsilon. Uniform data, d=10, n=10,000, margin=0.03, lambda=0.01Avg test error over 5!fold cross!valid. 200 random restarts. Our methodSensitivity method00.020.040.060.080.10.120.140.160.180.20.20.250.30.350.40.450.50.55Epsilon. Unseparable data, d=10, n=10,000, lambda=0.01Avg test error over 5\u2212fold cross\u2212valid. 200 random restarts. Our methodSensitivity method\fsensitive. An interesting future direction would be to explore whether other methods that prevent\nover\ufb01tting also have such properties.\nOther future directions would be to apply our techniques to other commonly used machine-learning\nalgorithms, and to explore whether our techniques can be applied to more general optimization\nproblems. Theorem 3 shows that our method can be applied to a class of optimization problems\nwith certain restrictions. An open question would be to remove some of these restrictions.\nAcknowledgements. We thank Sanjoy Dasgupta and Daniel Hsu for several pointers.\n\nReferences\n[1] R. Agrawal and R. Srikant. Privacy-preserving data mining. SIGMOD Rec., 29(2):439\u2013450, 2000.\n[2] B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consis-\n\ntency too: a holistic solution to contingency table release. In PODS, pages 273\u2013282, 2007.\n\n[3] A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In R. E.\n\nLadner and C. Dwork, editors, STOC, pages 609\u2013618. ACM, 2008.\n\n[4] K. Chaudhuri and N. Mishra. When random sampling preserves privacy. In C. Dwork, editor, CRYPTO,\n\nvolume 4117 of Lecture Notes in Computer Science, pages 198\u2013213. Springer, 2006.\n\n[5] C. Dwork. Differential privacy. In M. Bugliesi, B. Preneel, V. Sassone, and I. Wegener, editors, ICALP\n\n(2), volume 4052 of Lecture Notes in Computer Science, pages 1\u201312. Springer, 2006.\n\n[6] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis.\n\nIn Theory of Cryptography Conference, pages 265\u2013284, 2006.\n\n[7] A. Ev\ufb01mievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining.\n\nIn PODS, pages 211\u2013222, 2003.\n\n[8] S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. What can we learn\n\nprivately? In Proc. of Foundations of Computer Science, 2008.\n[9] C. T. Kelley. Iterative Methods for Optimization. SIAM, 1999.\n[10] A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam.\n\nanonymity. In ICDE, page 24, 2006.\n\nl-diversity: Privacy beyond k-\n\n[11] F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, pages 94\u2013103, 2007.\n[12] A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In IEEE Symposium\n\non Security and Privacy, pages 111\u2013125. IEEE Computer Society, 2008.\n\n[13] K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data analysis. In\n\nD. S. Johnson and U. Feige, editors, STOC, pages 75\u201384. ACM, 2007.\n\n[14] P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its en-\nIn Proc. of the IEEE Symposium on Research in\n\nforcement through generalization and suppression.\nSecurity and Privacy, 1998.\n\n[15] S. Shalev-Shwartz and N. Srebro. Svm optimization: Inverse dependence on training set size. In Interna-\n\ntional Conference on Machine Learning(ICML), 2008.\n\n[16] K. Sridharan, N. Srebro, and S. Shalev-Schwartz. Fast rates for regularized objectives. In Neural Infor-\n\nmation Processing Systems, 2008.\n\n8\n\n\f", "award": [], "sourceid": 964, "authors": [{"given_name": "Kamalika", "family_name": "Chaudhuri", "institution": null}, {"given_name": "Claire", "family_name": "Monteleoni", "institution": null}]}