Traditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty include measurement/quantisation errors, data staleness, and multiple repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution. Rather than abstracting uncertain data by statistical derivatives (such as mean and median), we discover that the accuracy of a decision tree classifier can be much improved if the "complete information" of a data item that takes into account the probability density function (pdf) of that item's value is utilised. We extend classical decision tree building algorithms to handle data tuples with uncertain values. Extensive experiments have been conducted that show that the resulting classifiers are more accurate than those using value averages. Since processing pdf's is computationally more costly than processing single values (e.g., averages), decision tree construction on uncertain data is more CPU demanding than that for certain data. To tackle this problem, we propose a series of pruning techniques that can greatly improve construction efficiency.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=a848ca33a91c347501d1cbf506773db4p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=a848ca33a91c347501d1cbf506773db4p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
Most of the common techniques in text mining are based on the statistical analysis of a term either word or phrase. Statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. Thus, the underlying text mining model should indicate terms that capture the semantics of text. In this case, the mining model can capture terms that present the concepts of the sentence, which leads to discover the topic of the document. A new concept-based mining model that analyzes terms on the sentence, document, and corpus levels is introduced. The concept-based mining model can effectively discriminate between non-important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The proposed mining model consists of sentence-based concept analysis, document-based concept analysis, corpus-based concept-analysis, and concept-based similarity measure. The term which contributes to the sentence semantics is analyzed on the sentence, document, and corpus levels rather than the traditional analysis of the document only. The proposed model can efficiently find significant matching concepts between documents according to the semantics of their sentences. The similarity between documents is calculated based on a new concept-based similarity measure. The proposed similarity measure takes full advantage of using the concept analysis measures on the sentence, document, and corpus levels in calculating the similarity between documents. Large sets of experiments using the proposed concept-based mining model on different datasets in text clustering are conducted. The experiments demonstrate extensive comparison between the concept-based analysis and the traditional analysis. Experimental results demonstrate the substantial enhancement of the clustering quality using the sentence-based, document-based, corpus-based and combined approach concept analysis.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=cf535a21e7a30a1d3150d126a974dcc3p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=cf535a21e7a30a1d3150d126a974dcc3p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
This paper presents a knowledge discovery framework for the construction of Community Web Directories, a concept that we introduced in our recent work, applying personalization to Web directories. In this context, the Web directory is viewed as a thematic hierarchy and personalization is realized by constructing user community models on the basis of usage data. In contrast to most of the work on Web usage mining, the usage data that are analyzed here correspond to user navigation throughout the Web, rather than a particular Web site, exhibiting as a result a high degree of thematic diversity. For modeling the user communities, we introduce a novel methodology that combines the users#x2019; browsing behavior with thematic information from the Web directories. Following this methodology we enhance the clustering and probabilistic approaches presented in previous work and we also present a new algorithm that combines these two approaches. The resulting community models take the form of Community Web Directories. The proposed personalization methodology is evaluated both on a specialized artificial and a general-purpose Web directory, indicating its potential value to the Web user. The experiments also assess the effectiveness of the different machine learning techniques on the task.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=6ebe0ac265f17d79e572b51f9b57dae7p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=6ebe0ac265f17d79e572b51f9b57dae7p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
In this paper, we present a new type of spatial query called Nearest Surrounder (NS) query. An NS query searches the nearest polygon-shaped spatial objects (referred to as nearest surrounder (NS) objects) for consecutive ranges of angles around a specified query point. With additional angular information provided with NS objects, an NS query is more informative than many other spatial queries. We derive two NS query variants, namely, multi-tier NS (m-NS) query and angle-constrained NS (ANS) query. An m-NS query searches multiple layer of NS objects for the same range of angles from a query point. An ANS query searches NS objects within a specified range of angles. To evaluate NS queries and their variants, we explore anglebased and distance-based bound properties of polygons. Based on these properties, we devise two efficient algorithms, namely, Sweep and Ripple. They access objects in an order according to their orientations and distances to the query point, respectively, based on R-tree. They can also finish a search with at most one index lookup and progressively deliver a query result. Through empirical studies, we evaluate the proposed algorithms and report their performance for both synthetic and real object sets.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=63e57cd56bc2a95d1f2a29039d2f0531p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=63e57cd56bc2a95d1f2a29039d2f0531p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
Efficiently and effectively searching for similar videos is an important and non-trivial problem in content-based video search systems. In this paper, we propose a subspace symbolization approach, namely SUDS, for content-based search on very large video databases. The novelty of SUDS is that it explores the data distribution in subspaces to build a visual dictionary with which the video data are processed by deriving the string matching techniques with two-step data simplification. Specifically, we first propose an adaptive approach, called VLP, to divide the whole visual feature space into a series of subspaces of variable lengths, from which the dominant ones are selected. By clustering the video keyframes over each dominant subspace, a stable visual dictionary is built and a compact video representation model is eveloped by transforming each keyframe into a word that is a series of symbols in the dominant subspaces, and further each video into a sequence of words. Then, we present an innovative similarity measure called CVE, which adopts a complementary information compensation scheme based on the visual features and sequence ontext of videos. Finally, an efficient two-layered index strategy with a number of query optimizations is proposed to facilitate video search. The experimental results demonstrate the high effectiveness and efficiency of SUDS.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=4ff29159fceb6873b1cdf7dec02d6794p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=4ff29159fceb6873b1cdf7dec02d6794p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
Ordinal regression has wide applications in many domains where the human evaluation plays a major role. Most current ordinal regression methods are based on Support Vector Machines (SVM) and suffer from the problems of ignoring the global information of the data and high computational complexity. On the other hand, although Linear Discriminant Analysis (LDA) and its kernel version, Kernel Discriminant Analysis (KDA) takes consideration of the global information of the data as well as the distribution of the classes and its performance has been proved in classification, it fails to be used for solving ordinal regression problems because ordinal information of the data can not be unutilized. To solve this problem, in this paper, we propose a novel regression approach by extending the Kernel Discriminant Learning using a rank constraint. The proposed algorithm is very efficient since the computational complexity is linear to the data size. We demonstrate experimentally that the proposed method is capable to preserve the rank of data classes in a projected data space. In comparison to several ordinal regression methods, our method is more efficient and is competitive with them in accuracy.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=05ee3d3663cd722151a9ce87f22132e2p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=05ee3d3663cd722151a9ce87f22132e2p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
Co-clustering heterogeneous data has attracted extensive attention recently due to its high impact on various important applications, such us text mining, image retrieval and bioinformatics. However, data co-clustering without any prior knowledge or background information is still a challenging problem. In this paper, we propose a Semi-Supervised Non-negative Matrix Factorization (SS-NMF) framework for data co-clustering. Specifically, our method computes new relational matrices by incorporating user provided constraints through simultaneous distance metric learning and modality selection. Using an iterative algorithm, we then perform tri-factorizations of the new matrices to infer the clusters of different data types and their correspondence. Theoretically, we prove the convergence and correctness of SS-NMF co-clustering. In addition, we show that our framework provides a unified view for data co-clustering and has several advantages over existing approaches. Through extensive experiments conducted on publicly available text, gene expression, and image data sets, we demonstrate the superior performance of SS-NMF for heterogeneous data co-clustering.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=dd093824b7d3c78d4f1b0d861a2bb5e7p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=dd093824b7d3c78d4f1b0d861a2bb5e7p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
Situation-Based Access Control (SitBAC) is a conceptual model for representing access-control policies of healthcare organizations by characterizing situations of access to patient data. The SitBAC model enables formal representation of access situations as an ontology of concepts (Patient, Data-Requestor, EHR, Task, and Response), along with their attributes and relationships. A competing access-control model is the Contextual Role-Base Access Control (Context) model. The Context model uses logical expressions (rules) that specify contextual authorizations (i.e., characteristics of access requests that are available at access time). Open questions that relate to formal representation of scenarios involving access to patient data are: 1) which of the two models yields a formal representation that is easier to comprehend; 2) which of the two models facilitates the synthesis of correct models, and how does the task complexity affect performance of comprehension and synthesis. In this study, we address these questions through a controlled experiment. The results of the experiment suggest that while there are no differences between the two models when it comes to comprehending or synthesizing simple scenarios of data access, for complex scenarios there is a significant advantage to the SitBAC model, in terms of both comprehension and synthesis.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=68491d35fabfcca26dbfeac121490c51p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=68491d35fabfcca26dbfeac121490c51p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
A P2P-based framework supporting the extraction of aggregates from historical multi-dimensional data is proposed, which provides efficient and robust query evaluation. When a data population is published, data are summarized in a synopsis, consisting of an index built on top of a set of sub-synopses (storing compressed representations of distinct data portions). The index and the sub-synopses are distributed across the network, and suitable replication mechanisms taking into account the query workload and network conditions are employed that provide the appropriate coverage for both the index and the sub-synopses.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=a36c38e2bce9658ddc03290e8451c9f9p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=a36c38e2bce9658ddc03290e8451c9f9p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
Local classifiers are sometimes called lazy learners because they do not train a classifier until presented with a test sample. However, such methods are generally not completely lazy, because the neighborhood size k (or other locality parameter) is usually chosen by cross-validation on the training set, which can require significant preprocessing and risks overfitting. We propose a simple alternative to cross-validation of the neighborhood size that requires no pre-processing: instead of committing to one neighborhood size, average the discriminants for multiple neighborhoods. We show that this forms an expected estimated posterior that minimizes the expected Bregman loss with respect to the uncertainty about the neighborhood choice. We analyze this approach for six standard and state-of-the-art local classifiers, including discriminative adaptive metric kNN (DANN), a local support vector machine (SVM-KNN), hyperplane distance nearest-neighbor (HKNN) and a new local Bayesian quadratic discriminant analysis. The empirical effectiveness of this technique vs. cross-validation is validated with experiments on several benchmark data sets. Experiments with seven benchmark datasets show that the same classification performance is attained as cross-validation without any training.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=255eb0f67927e20fd762576f21b0f5c0p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=255eb0f67927e20fd762576f21b0f5c0p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
In many applications involving spatial objects, we are only interested in objects that are directly visible from query points. In this article, we formulate the visible k nearest neighbor (VkNN) query and present incremental algorithms as a solution, with two variants differing in how to prune objects during the search process. One variant applies visibility pruning to only objects, whereas the other variant applies visibility pruning to index nodes as well. Our experimental results show that the latter outperforms the former. We further propose the aggregate VkNN query, which finds the visible k nearest objects to a set of query points based on an aggregate distance function. We also propose two approaches to processing the aggregate VkNN query. One accesses the database via multiple VkNN queries, whereas the other issues an aggregate k nearest neighbor query to retrieve objects from the database and then re-rank the results based on the aggregate visible distance metric. With extensive experiments, we show that the latter approach consistently outperforms the former one.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=86920e24cf70c111bbf9d3b205c9624ap=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=86920e24cf70c111bbf9d3b205c9624ap=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
Contour mapping is a crucial part of many wireless sensor network applications. Many efforts have been made to avoid collecting data from all the sensors in the network and producing maps at the sink, which is proven to be inefficient. The existing approaches (often aggregation based), however, suffer from heavy transmission traffic and incur large computational overheads on each sensor node. We propose Iso-Map, an energy-efficient protocol for contour mapping, which builds contour maps based solely on the reports collected from intelligently selected #x201C;isoline nodes#x201D; in wireless sensor networks. Iso-Map achieves high-quality contour mapping while significantly reducing the generated traffic from O(n) to O(#x221A;n), where n is the total number of sensor nodes in the field. The per-node computation overhead is also restrained as a constant. We conduct comprehensive trace-driven simulations to verify this protocol, and demon-strate that Iso-Map outperforms the previous approaches in the sense that it produces contour maps of high fidelity with significantly reduced energy cost.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=c579acd6a080da42933adac81c69aeabp=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=c579acd6a080da42933adac81c69aeabp=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
On-line learning algorithms often have to operate in the presence of concept drift (i.e., the concepts to be learnt can change with time). This paper presents a new categorization for concept drift, separating drifts according to different criteria into mutually exclusive and non-heterogeneous categories. Moreover, although ensembles of learning machines have been used to learn in the presence of concept drift, there has been no deep study of why they can be helpful for that and which of their features can contribute or not for that. As diversity is one of these features, we present a diversity analysis in the presence of different types of drift. We show that, before the drift, ensembles with less diversity obtain lower test errors. On the other hand, it is a good strategy to maintain highly diverse ensembles to obtain lower test errors shortly after the drift independent on the type of drift, even though high diversity is more important for more severe drifts. Longer after the drift, high diversity becomes less important. Diversity by itself can help to reduce the initial increase in error caused by a drift, but does not provide a faster recovery from drifts in long term.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=1036e09ccea979504aafb6d421df054ep=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=1036e09ccea979504aafb6d421df054ep=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
Two most important tasks in information extraction from the Web are web page structure understanding and natural language sentences processing. However, little work has been done towards an integrated statistical model for understanding web page structures and processing natural language sentences within the HTML elements. Our recent work on web page understanding introduces a joint model of Hierarchical Conditional Random Fields (i.e. HCRF) and extended Semi-Markov Conditional Random Fields (i.e. Semi-CRF) to leverage the page structure understanding results in free text segmentation and labeling. In this top-down integration model, the decision of the HCRF model could guide the decision-making of the Semi-CRF model. However, the drawback of the top-down integration strategy is also apparent, i.e., the decision of the Semi-CRF model could not be used by the HCRF model to guide its decision-making. This paper proposed a novel framework called WebNLP which enables bidirectional integration of page structure understanding and text understanding in an iterative manner. We have applied the proposed framework to local business entity extraction and Chinese person and organization name extraction. Experiments show that the WebNLP framework achieved significantly better performance than existing methods.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=b5a1ece99aa4d846953ca810cea65089p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=b5a1ece99aa4d846953ca810cea65089p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
#x201C;Ontology matching#x201D; is the process of finding correspondences between entities belonging to different ontologies. This paper describes a set of algorithms that exploit upper ontologies as semantic bridges in the ontology matching process and presents a systematic analysis of the relationships among features of matched ontologies (number of simple and composite concepts, stems, concepts at the top level, common English suffixes and prefixes, ontology depth), matching algorithms, used upper ontologies, and experiment results. This analysis allowed us to state under which circumstances the exploitation of upper ontologies gives significant advantages with respect to traditional approaches that do no use them. We run experiments with SUMO-OWL (a restricted version of SUMO), OpenCyc and DOLCE. The experiments demonstrate that when our #x201C;structural matching method via upper ontology#x201D; uses an upper ontology large enough (OpenCyc, SUMO-OWL), the recall is significantly improved while preserving the precision obtained without upper ontologies. Instead, our #x201C;non structural matching method#x201D; via OpenCyc and SUMO-OWL improves the precision and maintains the recall. The #x201C;mixed method#x201D; that combines the results of structural alignment without using upper ontologies and structural alignment via upper ontologies improves the recall and maintains the F-measure independently of the used upper ontology.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=138e882134b7a3751c774479229ecb27p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=138e882134b7a3751c774479229ecb27p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
Gradient descent is a widely used paradigm for solving many optimization problems. Gradient descent aims to minimize a target function in order to reach a local minimum. In machine learning or data mining, this function corresponds to a decision model that is to be discovered. In this paper, we propose a preliminary formulation of gradient descent with data privacy preservation. We present two approaches#x2014;stochastic approach and least square approach#x2014;under different assumptions. Four protocols are proposed for the two approaches incorporating various secure building blocks for both horizontally and vertically partitioned data. We conduct experiments to evaluate the scalability of the proposed secure building blocks and the accuracy and efficiency of the protocols for four different scenarios. The excremental results show the proposed secure building blocks are scalable and the proposed protocols allows us to determine a better secure protocol for the applications for each scenario.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=6bef6076bebecd2d55c4916aa94aa2b4p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=6bef6076bebecd2d55c4916aa94aa2b4p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
We adapt Mitchell's version space algorithm for mining k-CNF formulae. Advantages of this algorithm are that it runs in a single pass over the data, is conceptually simple, can be used for missing value prediction, and has interesting theoretical properties, while an empirical evaluation on classification tasks yields competitive predictive results.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=8be77969eb9991024f13ddcaa50b258ep=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=8be77969eb9991024f13ddcaa50b258ep=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
Wireless sensor networks have been proposed for facilitating various monitoring applications (e.g., environmental monitoring and military surveillance) over a wide geographical region. In these applications, spatial queries that collect data from wireless sensor networks play an important role. One such query is the K Nearest Neighbor (KNN) query that facilitates collection of sensor data samples based on a given query location and the number of samples specified (i.e., K). Recently, itinerary-based KNN query processing techniques, that propagate queries and collect data along a pre-determined itinerary, have been developed. Prior studies demonstrate that itinerary-based KNN query processing algorithms are able to achieve better energy efficiency than other existing algorithms developed upon tree-based network infrastructures. However, how to derive itineraries for KNN query based on different performance requirements remains a challenging problem. In this paper, we propose a Parallel Concentric-circle Itinerary-based KNN (PCIKNN) query processing technique that derives different itineraries by optimizing either query latency or energy consumption. The performance of PCIKNN is analyzed mathematically and evaluated through extensive experiments. Experimental results show that PCIKNN outperforms the state-of-the-art techniques.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=c15d11e2abe7c1100c5dae594aa27d9bp=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=c15d11e2abe7c1100c5dae594aa27d9bp=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
Analysis on click-through data from a very large search engine log shows that users are usually interested in the top-ranked portion of returned search results. Therefore, it is crucial for search engines to achieve high accuracy on the top-ranked documents. While many methods exist for boosting video search performance, they either pay less attention to the above factor or encounter difficulties in practical applications. In this paper, we present a flexible and effective reranking method, called CR-Reranking, to improve the retrieval effectiveness. To offer high accuracy on the top-ranked results, CR-Reranking employs a cross-reference (CR) strategy to fuse multimodal cues. Specifically, multimodal features are first utilized separately to rerank the initial returned results at the cluster level, and then all the ranked clusters from different modalities are cooperatively used to infer the shots with high relevance. Experimental results show that the search quality, especially on the top-ranked results, is improved significantly.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=e6450aa1731f8a574841114ea596e1cdp=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=e6450aa1731f8a574841114ea596e1cdp=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
User profiling is a fundamental component of any personalization applications. Most existing user profiling strategies are based on objects that users are interested in (i.e. positive preferences), while ignoring the objects that users dislike (i.e. negative preferences). In this paper, we focus on search engine personalization and develop several concept-based user profiling methods that are based on both positive and negative preferences.We evaluate the proposed methods against our previously proposed personalized query clustering method. Experimental results show that profiles which capture and utilize both the user's positive and negative preferences perform the best. An important result from the experiments is that profiles with negative preferences can increase the separation between similar and dissimilar queries. The separation provides a clear threshold for an agglomerative clustering algorithm to terminate and improve the overall quality of the resulting query clusters.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=27aff7bcfecd2a9be5a75f0efaa1265bp=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=27aff7bcfecd2a9be5a75f0efaa1265bp=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/