Letting users sign on to the Internet once and securely access network resources anywhere has been one of the industry's enduring quests. While numerous standards efforts have steadily pursued this capability, most have been back-end technologies of which users are mostly unaware. Recent developments surrounding the open source OpenID federated-identity technology are bringing single sign-on efforts to the foreground.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://www.pheedo.com/click.phdo?s=c6cbde0df502b164bb279d3fc6398298p=1img alt= style=border: 0; border=0 src=http://www.pheedo.com/img.phdo?s=c6cbde0df502b164bb279d3fc6398298p=1//a
img src=http://www.pheedo.com/feeds/tracker.php?i=c6cbde0df502b164bb279d3fc6398298 style=display: none; border=0 height=1 width=1 alt=/
Distributed computing teaching environments (and e-science education in general) require a supportive policy framework that encourages cooperation and sharing. If teachers can share educational content rather than creating their own, they increase the number of quality resources available to them. However, in sharing these resources, IPR issues such as copyright ownership and licensing must be considered.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://www.pheedo.com/click.phdo?s=a8074126e9150d31c5061579b5585f89p=1img alt= style=border: 0; border=0 src=http://www.pheedo.com/img.phdo?s=a8074126e9150d31c5061579b5585f89p=1//a
img src=http://www.pheedo.com/feeds/tracker.php?i=a8074126e9150d31c5061579b5585f89 style=display: none; border=0 height=1 width=1 alt=/
Satellite-based Internet access, the preferred solution for remote locations on the ocean, either offers a low-bandwidth connection or is very expensive to deploy. A backbone structure that provides ocean-wide Internet coverage could provide an alternative to satellite uplinks. With a wide-area network forming a mesh of nodes using floating and moving objects as well as coastal facilities, the network would use next-generation long-range surface radio technology to provide medium- to high-bandwidth Internet access. To achieve high-bandwidth Internet access under these circumstances, the backbone must leverage state-of-the-art sensor network technology, autonomous routing mechanisms, and self-organizing abilities.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://www.pheedo.com/click.phdo?s=d2373058ae8b3b12cce3658a80f8cdc2p=1img alt= style=border: 0; border=0 src=http://www.pheedo.com/img.phdo?s=d2373058ae8b3b12cce3658a80f8cdc2p=1//a
img src=http://www.pheedo.com/feeds/tracker.php?i=d2373058ae8b3b12cce3658a80f8cdc2 style=display: none; border=0 height=1 width=1 alt=/
The recent advent of high throughput methods has generated large amounts of gene interaction data. This has allowed the construction of genome-wide networks. A significant number of genes in such networks remain uncharacterized and predicting the molecular function of these genes remains a major challenge. A number of existing techniques assume that genes with similar functions are topologically close in the network. Our hypothesis is that genes with similar functions observe similar annotation patterns in their neighborhood, regardless of the distance between them in the interaction network. We thus predict molecular functions of uncharacterized genes by comparing their functional neighborhoods to genes of known function. We propose a two-phase approach. First we extract functional neighborhood features of a gene using Random Walks with Restarts. We then employ a KNN classifier to predict the function of uncharacterized genes based on the computed neighborhood features. We perform leave-one-out validation experiments on two S. cerevisiae interaction networks revealing significant improvements over previous techniques. Our technique also provides a natural control of the trade-off between accuracy and coverage of prediction.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=97bb93a4e0fc9f420927c816eeedaa3cp=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=97bb93a4e0fc9f420927c816eeedaa3cp=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
Graph data mining is an active research area. Graphs are general modeling tools to organize information from heterogenous sources and have been applied in many scientific, engineering, and business fields. With the fast accumulation of graph data, building highly accurate predictive models for graph data emerges as a new challenge that has not been fully explored in the data mining community. In this paper, we demonstrate a novel technique called graph pattern diffusion kernel (GPD) with applications in cheminformatics. Our idea is to leverage existing frequent pattern discovery methods and to explore the application of kernel classifier (e.g. support vector machine) in building highly accurate graph classification. In our method, we first identify all frequent patterns from a graph database. We then map subgraphs to graphs in the graph database and use a process we call #x201C;pattern diffusion#x201D; to label nodes in the graphs. Finally we designed a novel graph alignment algorithm to compute the inner product of two graphs. We have tested our algorithm using a number of chemical structure data. The experimental results demonstrate that our method is significantly better than competing methods such as those kernel functions based on paths, cycles, and subgraphs.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=d2e37747a8139c2b8512920d8b83a948p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=d2e37747a8139c2b8512920d8b83a948p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
Time-course studies with microarray techniques and experimental replicates are very useful in biomedical research. We present, in replicate experiments, an alternative approach to select and cluster genes according to a new measure for association between genes. First the procedure normalizes and standardizes the expression profile of each gene and then identifies scaling parameters that will further minimize the distance between replicates of the same gene. Then, the procedure filters out genes with a flat profile, detects differences between replicates and separates genes without significant differences from the rest. For this last group of genes, we define a mean profile for each gene and use it to compute the distance between two genes. Next, a hierarchical clustering procedure is proposed, a statistic is computed for each cluster to determine its compactness and the total number of classes is determined. For the rest of the genes, those with significant differences between replicates, the procedure detects where the differences between replicates lie, and assigns each gene to the best fitting previously identified profile or defines a new profile. We illustrate this new procedure using simulated data and a representative data set arising from a microarray experiment with replication, and we report interesting results.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=5df44e363458a67000bed8e9b9f5417fp=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=5df44e363458a67000bed8e9b9f5417fp=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
We assume that allele frequency data have been extracted from several large DNA pools, each containing genetic material of up to hundreds of ascertained individuals. Our goal is to estimate the haplotype frequencies among the ascertained individuals by combining the pooled allele frequency data with prior knowledge about the possible haplotypes. Such prior information can be obtained, for example, from a database such as HapMap. We present a Bayesian haplotyping method for pooled DNA based on a continuous approximation of the multinomial distribution. The proposed method is applicable when the sizes of the DNA pools and/or the number of considered loci exceed the limits of several earlier methods. In the example analyses the proposed model clearly outperforms a deterministic greedy algorithm on real data from the HapMap database. With a small number of loci the proposed method performs similarly to an EM-algorithm which uses a multinormal approximation for the pooled allele frequencies, but which does not utilize prior information about the haplotypes. The method has been implemented in a Matlab code which is available upon request from the authors.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=5a189a3005455c2b139b1cb9a7176f08p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=5a189a3005455c2b139b1cb9a7176f08p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
An efficient two step Markov blanket method for modeling and inferring complex regulatory networks from large-scale microarray datasets is presented. The inferred gene regulatory network is based on the time series gene expression data capturing the underlying gene interactions. For constructing a highly accurate GRN, the proposed method performs (i) discovery of a gene's Markov Blanket (MB), (ii) formulation of a flexible measure to determine the network's quality, (iii) efficient searching with the aid of a guided genetic algorithm, (iv) pruning to obtain a minimal set of correct interactions. Investigations are carried out using both synthetic as well as yeast cell-cycle gene expression data sets. The realistic synthetic datasets validate the robustness of the method by varying topology, sample size, time-delay, noise, vertex in-degree and presence of hidden nodes. It is shown that the proposed approach has excellent inferential capabilities and high accuracy even in the presence of noise. The gene network inferred from yeast cell-cycle data is investigated for its biological relevance using well known interactions, sequence analysis, motif patterns and GO data. Further, novel interactions are predicted for the unknown genes of the network and their influence on other genes is also discussed.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=ad112f4d804f9380d74fe1bce1243b09p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=ad112f4d804f9380d74fe1bce1243b09p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
Pairwise sequence alignment is a central problem in bioinformatics which forms the basis of many other applications. Two related sequences are expected to have a high alignment score, but relatedness is usually judged by statistical significance rather than by alignment score. Recently, it was shown that pairwise statistical significance gives promising results as an alternative to database statistical significance for getting individual significance estimates of pairwise alignment scores. The improvement was mainly attributed to making the statistical significance estimation process more sequence-specific and database-independent. In this paper, we use sequence-specific and position-specific substitution matrices to derive the estimates of pairwise statistical significance, which is expected to use more sequence-specific information in estimating pairwise statistical significance. Experiments on a benchmark database with sequence-specific substitution matrices at different levels of sequence-specific contribution were conducted, and results confirm that using sequence-specific substitution matrices for estimating pairwise statistical significance is significantly better than using a standard matrix like BLOSUM62, and than database statistical significance estimates reported by popular database search programs like BLAST, PSI-BLAST (without pre-trained PSSMs) and SSEARCH on a benchmark database, but with pre-trained PSSMs, PSI-BLAST results are significantly better. Further, using position-specific substitution matrices for estimating pairwise statistical significance gives significantly better results even than PSI-BLAST using pre-trained PSSMs.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=54bb8ab8a6ffa8713de61e2fcddb3bc8p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=54bb8ab8a6ffa8713de61e2fcddb3bc8p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
Whole--sample mass spectrometry (MS) proteomics allows for a parallel measurement of hundreds of proteins present in a variety of biospecimens. Unfortunately, the association between MS signals and these proteins is not straightforward. The need to interpret mass spectra demands the development of methods for accurate labeling of ion species in such profiles. To aid this process we have developed a new peak-labeling procedure for associating protein and peptide labels with peaks. This computational method builds upon characteristics of proteins expected to be in the sample, such as the amino sequence, mass weight, and expected concentration within the sample. A new probabilistic score which incorporates this information is proposed. We evaluate and demonstrate our method's ability to label peaks first on simulated MS spectra and then on MS spectra from human serum with a spiked-in calibration mixture.br style=clear: both;/
img alt= style=border: 0; height:1px; width:1px; border=0 src=http://www.pheedo.com/img.phdo?i=39c009bc37f8e019db33e2e55e8b9144 height=1 width=1/
img src=http://www.pheedo.com/feeds/tracker.php?i=39c009bc37f8e019db33e2e55e8b9144 style=display: none; border=0 height=1 width=1 alt=/
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Multiple sequence alignment is typically the first step in estimating phylogenetic trees, with the assumption being that as alignments improve, so will phylogenetic reconstructions. Over the last decade or so, new multiple sequence alignment methods have been developed to improve comparative analyses of protein structure, but these new methods have not been typically used in phylogenetic analyses. In this paper, we report on a simulation study that we performed to evaluate the consequences of using these new multiple sequence alignment methods in terms of the resultant phylogenetic reconstruction. We find that while alignment accuracy is positively correlated with phylogenetic accuracy, the amount of improvement in phylogenetic estimation that results from an improved alignment can range from quite small to substantial. We observe that phylogenetic accuracy is most highly correlated with alignment accuracy when sequences are most difficult to align, and that variation in alignment accuracy can have little impact on phylogenetic accuracy when alignment error rates are generally low. We discuss these observations and implications for future work.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=af861cef704dd5e1eb5f419cfa6c8ed0p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=af861cef704dd5e1eb5f419cfa6c8ed0p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
The discovery of motifs in biosequences is frequently torn between the rigidity of the model on the one hand and the abundance of candidates on the other. In particular, motifs that include wildcards or "don't cares" escalate exponentially with their number, and this gets only worse if a don't care is allowed to stretch up to some prescribed maximum length. In this paper, a notion of extensible motif in a sequence is introduced and studied, which tightly combines the structure of the motif pattern, as described by its syntactic specification, with the statistical measure of its occurrence count. It is shown that a combination of appropriate saturation conditions and the monotonicity of probabilistic scores over regions of constant frequency afford us significant parsimony in the generation and testing of candidate overrepresented motifs. A suite of software programs called Varun is described, implementing the discovery of extensible motifs of the type considered. The merits of the method are then documented by results obtained in a variety of experiments primarily targeting protein sequence families. Of equal importance seems the fact that the sets of all surprising motifs returned in each experiment are extracted faster and come in much more manageable sizes than would be obtained in the absence of saturation constraints.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://www.pheedo.com/click.phdo?s=3307e5dac55e0187228cb07a45d81506p=1img alt= style=border: 0; border=0 src=http://www.pheedo.com/img.phdo?s=3307e5dac55e0187228cb07a45d81506p=1//a
img src=http://www.pheedo.com/feeds/tracker.php?i=3307e5dac55e0187228cb07a45d81506 style=display: none; border=0 height=1 width=1 alt=/
In this work we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analysing gene expression datasets. In the second part of this work we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray dataset. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=e28b161f244912118e66a1cdae7767e5p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=e28b161f244912118e66a1cdae7767e5p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
Prediction of protein functional sites from sequence-derived data remains an open bioinformatics problem. We have developed a phylogenetic motif (PM) functional site prediction approach that identifies functional sites from alignment fragments that parallel the evolutionary patterns of the family. In our approach, PMs are identified by comparing tree topologies of each alignment fragment to that of the complete phylogeny. Herein, we bypass the phylogenetic reconstruction step and identify PMs directly from distance matrix comparisons. In order to optimize the new algorithm, we consider three different distance matrices and thirteen different matrix similarity scores. We assess the performance of the various approaches on a structurally non-redundant dataset that includes three types of functional site definitions. Without exception, the predictive power of the original approach outperforms the distance matrix variants. While the distance matrix methods fail to improve upon the original approach, our results are important because they clearly demonstrate that the improved predictive power is based on the topological comparisons. Meaning, phylogenetic trees are a straightforward, yet powerful way to improve functional site prediction accuracy. While complementary studies have shown that topology improves predictions of protein-protein interactions, this report represents the first demonstration that trees improve functional site predictions as well.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=acaaecf5df2ae9b0bbf79ab5293a2a12p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=acaaecf5df2ae9b0bbf79ab5293a2a12p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
To address challenging flexible docking problems, a number of docking algorithms pre-generate large collections of candidate conformers. To remove the redundancy from such ensembles, a central problem in this context is to report a selection of conformers maximizing some geometric diversity criterion. We make three contributions to this problem. First, we resort to geometric optimization so as to report selections maximizing the molecular volume or molecular surface area (MSA) of the selection. Greedy strategies are developed, together with approximation bounds. Second, to assess the efficacy of our algorithms, we investigate two conformer ensembles corresponding to a flexible loop of four protein complexes. By focusing on the MSA of the selection, we show that our strategy matches the MSA of standard selection methods, but resorting to a number of conformers between one and two orders of magnitude smaller. This observation is qualitatively explained using the Betti numbers of the union of balls of the selection. Finally, we replace the conformer selection problem in the context of multiple-copy flexible docking. On the afore-mentioned systems, we show that using the loops selected by our strategy can improve the result of the docking process.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=e21e4bc7ed54dfb974f1fe04f1bb3bcbp=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=e21e4bc7ed54dfb974f1fe04f1bb3bcbp=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
Constraint-based structure learning algorithms generally perform well on sparse graphs. Although sparsity is not uncommon, there are some domains where the underlying graph can have some dense regions; one of these domains is gene regulatory networks, which is the main motivation to undertake the study described in this paper. We propose a new constraint-based algorithm that can both increase the quality of output and decrease the computational requirements for learning the structure of gene regulatory networks. The algorithm is based on and extends the PC algorithm. Two different types of information are derived from the prior knowledge; one is the probability of existence of edges, and the other is the nodes that seem to be dependent on a large number of nodes compared to other nodes in the graph. Also a new method based on Gene Ontology for gene regulatory network validation is proposed. We demonstrate the applicability and effectiveness of the proposed algorithms on both synthetic and real data sets.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=29ada1b4438423e66720d5d36f421759p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=29ada1b4438423e66720d5d36f421759p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
The functions of proteins is often realized through their mutual interactions. Determining a relative transformation for a pair of proteins and their conformations which form a stable complex, reproducible in nature, is known as docking. It is an important step in drug design, structure determination and understanding function and structure relationships. We provide a scoring model for rigid docking and error-bounded approximation algorithms to predict docking sites. Translational search is sped up using the Fourier domain. Shape based interactions is shown to give good results for a large range of pairs of proteins.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=f7381028fb6ab2813671f4e5f41f20d4p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=f7381028fb6ab2813671f4e5f41f20d4p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
In mass spectrometry (MS) analysis, false peak detection results are unavoidable due to severe spectrum variations. However, most current peak detection methods are neither robust enough to resist spectrum variations nor flexible enough to revise false detection results. To improve flexibility, we introduce peak tree to represent the peak information in MS spectra. Each tree node is a peak judgment on a range of scales, and each tree decomposition, as a set of nodes, is a candidate peak detection result. To improve robustness, we combine peak detection and common peak alignment into a closed-loop framework, which finds the optimal decomposition considering both peak intensity and common peak information. The common peak information is derived from the density clustering of peaks detected throughout the MS database and loopily refined to direct peak tree decomposition. Finally, we present an improved ant colony optimization (ACO) biomarker selection method to build a MS analysis system based on peak tree. Experiment shows that our peak detection method can better resist spectrum variations and provide higher sensitivity and lower false detection rates than conventional methods. The benefits from our peak tree based system for MS disease analysis are also proved on real SELDI databr clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=e045badaff523f1c6ea3b8696b263943p=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=e045badaff523f1c6ea3b8696b263943p=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/
A standard approach to estimate intracellular fluxes on a genome-wide scale is flux balance analysis (FBA), which optimizes an objective function subject to constraints on (relations between) fluxes. The performance of FBA models heavily depends on the relevance of the formulated objective function and the completeness of the defined constraints. Previous studies indicated that FBA predictions can be improved by adding regulatory on/off constraints. These constraints were imposed based on either absolute (Shlomi2007a,Covert2004) or relative (Shlomi2008) gene expression values. We provide a new algorithm that directly uses regulatory up/down constraints based on gene expression data in FBA optimization (tFBA). Our assumption is that if the activity of a gene drastically changes from one condition to the other, the flux through the reaction controlled by that gene will change accordingly. The potential of the proposed method, tFBA, is demonstrated through the analysis of fluxes in yeast under nine different cultivation conditions. We illustrate that changes in gene expression are predictive for changes in fluxes. We compare tFBA and FBA predictions to show that our approach yields more biologically relevant results.br clear=both style=clear: both;/
br clear=both style=clear: both;/
a href=http://ads.pheedo.com/click.phdo?s=a28537be4af70b9f9e89992c5afce29bp=1img alt= style=border: 0; border=0 src=http://ads.pheedo.com/img.phdo?s=a28537be4af70b9f9e89992c5afce29bp=1//a
img alt= height=0 width=0 border=0 style=display:none src=http://a.rfihub.com/eus.gif?eui=2225/