Precision and Privacy Preserving Multi-Keyword Search over Encrypted Data
Date
2020
Authors
Siva Kumar, D Venkata Naga.
Journal Title
Journal ISSN
Volume Title
Publisher
National Institute of Technology Karnataka, Surathkal
Abstract
A growing number of data owners are increasingly using cloud storage services of various Cloud Service Providers (CSPs) for storing and managing their information/documents. The cloud storage services provide numerous benefits to the data owners that
include cost savings, greater reliability, ubiquitous access, and better performance. In
spite of these benefits, the stored data at the remote cloud servers are vulnerable to the
attacks initiated by the untrustworthy CSPs. Such data risks and associated privacy
concerns have recently been in the limelight from revelations of extensive surveillance
by several national security agencies and other government entities. The primary concerns of storing documents at cloud servers are confidentiality and privacy due to the
loss of control over who accesses and manages the outsourced documents. In order to
address these concerns, sensitive data is required to be outsourced in encrypted form
to the cloud servers. Although the encryption guarantees confidentiality, it makes the
retrieval process more complex.
Searchable Encryption (SE) is a technique that guarantees confidentiality and privacy by storing documents in encrypted form at the cloud servers and allows search over
encrypted data without decrypting it. In SE, the data owners store their documents, and
the corresponding indexes in encrypted form, and the data users retrieve the documents
by sending encrypted queries (trapdoors) to the cloud server. Despite its privacy and
confidential guarantee, the privacy of trapdoor keywords and index keywords could be
compromised due to the information leakages that are caused by the vulnerabilities in
the adopted schemes used for encrypting indexes and queries. The information leakages
include frequency of ciphertext values, rank-order information, and search pattern. The
cloud servers exploit these leakages to infer plaintext information through various information disclosure attacks such as frequency analysis attack, rank-order exploitation
attack, and scale analysis attack. Hence, this work aims at preventing these leakages
and thereby mitigates the attacks.
The cloud server uses frequency analysis attack to infer index keywords based on
the frequency leakage of ciphertext values (repetition of the same encrypted keywords’
relevance scores) in indexes. This leakage occurs due to the insufficient randomness in
the order preserving encryption (OPE) schemes that are used for encrypting keywords’
relevance scores in indexes. The existing OPE schemes leak frequency information
when there are same plaintext scores for two or more keywords within the same document. In this work, an Enhanced One-to-Many order-preserving mapping techniqueis developed with improved randomness to mitigate the frequency leakage. The experimental study confirms that the proposed technique reduces not only the frequency
leakage of keywords but also the co-occurring keywords.
The cloud server returns the relevant documents in descending order for a given
trapdoor based on the ranks of the documents. However, the cloud server uses the rankorder exploitation attack to infer the plaintext information of frequently issuing query
keywords or frequently occurring keywords of the dataset based on the rank information leakage. Scale analysis attack also occurs when the users issue the same trapdoor
again and again to retrieve the same documents. This attack enables the cloud server
to infer plaintext keywords of trapdoors based on the search pattern leakage, which can
be determined from the given set of trapdoors. The existing approaches prevent search
pattern leakage by adding random keywords to a list of query keywords in a trapdoor
generation approach, but precision gets affected due to the random keywords. These
approaches cannot prevent rank-order information leakage completely since the random values of random keywords follow the distribution of actual keywords’ relevance
scores. Therefore, it is highly essential to prevent these attacks by preserving the privacy of both rank information and the search pattern. In this work, a Pseudo-Ranking
approach is developed to address this issue with the help of two servers, i.e., cloud
server (CS) and the intermediate server (IS). The CS assigns pseudo-ranks to the documents instead of actual ranks, and the IS would nullify the impact of random keywords
of a trapdoor for achieving higher precision. The experimental results confirm that the
proposed approach preserves the privacy of both rank information and search pattern
without affecting precision.
Besides preventing these attacks, it is also essential to provide the latest relevant
documents to the users to enable them to choose timely decisions based on updated information. To provide the latest relevant documents, there should be a provision in SE
for allowing the data owners to perform the dynamic updates efficiently over the existing
encrypted indexes. The existing tree-based indexing schemes cannot perform dynamic
operations efficiently since the trees’ size is larger in terms of height and breadth. This
causes a delay in performing dynamic updates and retrieving top-k relevant documents.
In this work, a Max-heap tree based index structure is developed to address this issue.
The experimental results of the proposed tree index confirm that it improves the time
efficiency of top-k document retrieval and dynamic updates.
Description
Keywords
Department of Computer Science & Engineering, Searchable Encryption, Frequency of ciphertext values, Rank-orden information, Search pattern, Dynamic updates