Special writing paper: Concepts in Differential Privacy

Concepts in Differential Privacy Abstract Stored data in search log is insecure process to the search engine. Search log contains extremely sensitive data, as evidenced by the AOL incident. To Store information in the search log is identify the behavior of user. To maintain this sensitive data is risky process, because some security methods containing the drawbacks. Search engine companies provide security for search logs, in some cases intruder identifies the stored data then loss occurred. This paper provides security methods for the search data against the intruder. To store the data in the search log based on the keywords, clicks, queries etc. Anonymization is the method provides security for data but it loss the granularity. And another method is Ã¢â€š ¬-differential privacy provide utility for the problem. (Ã¢â€š ¬,Ã¢Ë†â€š)-probabilistic privacy used to calculate the noise distribution. ZEALOUS algorithm propose in this paper provide effective results with (Ã¢â€š ¬1,Ã¢Ë†â€š1)-indistingushability. This paper concludes w ith the comparable utility with the k-anonymity, Ã¢â€š ¬-differential privacy. To this algorithm produce the effective result. Keywords: Security, Privacy, Data Anonymity, Information Protection, Differential Privacy, Histogram INTRODUCTION To publish the search query logs are useful to know the behavior of a user. To interact users into search engine information stored in the form of search log. This stores the information based on the following schema {User_id, Query, Time, Clicks} Here User_id identifies the particular user. Query identifies the group of keywords to be searched by the user in search engine. User search the keyword in search engine like Ã¢â‚¬Å"JavaÃ¢â‚¬ then relevant information related to Java will be occurred in the browser. User clicks on the particular link it will store in the search log as number counts. And also store the time of the click on the user. Single user consists of a user history or search history by the search entities. User history partitioned into sessions by the similar queries. Queries can be grouped into form a query pair, this used for the preparation of data in the search log. Query pairs can be divided into sessions and each session contains the subsequent query. Generally keywords can be divided into two ways. Those are 1. Frequent 2. Infrequent 1. Frequent Keyword: Previous methods only introduce these keywords. Because of this keywords are produce easily with search logs compare to the infrequent. Users search the keyword in the search engine based on that criteria identify the frequent keywords. 2. Infrequent Keywords: Proposed method for this paper is to publish search log with infrequent keywords. To publish this keyword is to loss the utility and produce less results compare to frequent keywords. In the previous method k-anonymity the main aim of this method is to define effective anonymization models for query log data along with techniques to achieve such anonymiation. Publishing of user query search logs has become a sensitive issue. To develop anonymization methods to publish the searc log data without breaching privacy or reduce utility. Drawback of this method is to identify the data to the external linked attributes. Introduce Quasi-identifier to the identification of an individual by combining to the external data. Following is an example data set User Registration Search_log Fig 1: Anonymization of the data In the above tables explains that the user registration contains all the user details of the user history. Search_log table contains the data of the user searched data. These two tables are externally linked to each other with this data loss occurred. Putting these searches together may easily reveal the identity of the user. The idea behind this k-anonymity is provide guarantee to each and every individual and hidden the group of size k with respect to the quasi-identifiers. To produce the search logs with Ã¢â€š ¬-differential privacy provide good utility, but problem with the search logs is noise added to the search logs. Several methods are used to produce random noise in the differential privacy. According to this paper classify them as two categories Data-independent noise Data-dependent noise Adding noise to the data this data-independent noise is most basic one. Laplace noise addition belongs to this category. Compare to the data-dependent noise is most complex, but usually they lead to less distortion being introduced. But this paper focus on the data-independent noise, which is most frequently uses in data sets. To produce effective results with Ã¢â€š ¬-differential privacy add laplace distribution to the result. Zealous algorithm consists a two phase framework for the purpose of identify the frequent items in the search log. And set two threshold values to publish the search logs with more privacy. Search engine companies apply this algorithm to generate statics with (Ã¢â€š ¬,Ã¢Ë†â€š)-probabilistic differentially private to retaining good utility for the applications. Beyond publishing search logs this paper believe that findings are of interest when publishing frequent item sets. This algorithm protects privacy against much stronger attackers than those compare the previous methods. RELATED WORK Search Log Anonymization In the previous incident occur in the AOL search log, it reveals the data of a user. Adar propose a method it appears at least t times before it can be decoded, which may potentially remove too many unused queries. And another method tokenize each query and hashes the corresponding log identifiers proposed by Kumar at el.[21]. This method improve the frequency of the search and leaks the data through hidden tokens. To overcome the problems in previous method introduce the anonymization models have been developed for search log release. Hong et al. [17] and Liu at al.[23] anonymized search logs based on k-anonymization which is not accurate as differential privacy. Xiong at el. [15] presents the query log analysis applications and various granularities of releasing log information and their associated privacy threats. Korolova et al. [20] release first applied the accurate privacy notion to release the search log based on differential privacy by adding Laplace noise. To add the Laplace noise to the counts of selected queries and urls is straightforward directly maximize the output utility with optimization models. Publish the frequent keywords, queries and clicks in search logs and comparison for two relaxations of Ã¢â€š ¬-differential privacy. This paper works related to framework for collecting, storing, and mining search logs in a distributed manner. Differential Privacy Dwork at al. [7,8] propose the definition of differential privacy. A randomized algorithm is differential private if for any pair of neighboring inputs, the probability of generating the same output. This means that two data sets are close to each other, a differential privacy algorithm behave same on the two data sets. This process provide sufficient privacy protection for user data. And also introduce the data publishing techniques which ensure Ã¢â€š ¬-differential privacy while providing accurate result. Search queries contain sensitive information it can lead to re-identification, approaches include query results, user-id to prevent re-identification of individuals from the search queries. This approach differs from the above it interact access framework that does not directly depend on anonymization for privacy, it differs from the semantic policies and differential privacy.

Special writing paper

Monday, October 14, 2019

Concepts in Differential Privacy

No comments:

Post a Comment