Abstract—Modern day enterprises have a large IT infrastructure comprising thousands of applications running on servers housed in tens of data centers geographically spread out. T...
Rahul Singh, Prashant J. Shenoy, K. K. Ramakrishna...
A labeled sequence data set related to a certain biological property is often biased and, therefore, does not completely capture its diversity in nature. To reduce this sampling b...
We propose to use AdaBoost to efficiently learn classifiers over very large and possibly distributed data sets that cannot fit into main memory, as well as on-line learning wher...
Peta-scale scientific applications running on High End Computing (HEC) platforms can generate large volumes of data. For high performance storage and in order to be useful to scien...
Fang Zheng, Hasan Abbasi, Ciprian Docan, Jay F. Lo...
Recently, the concept of a species containing both core and distributed genes, known as the supra- or pangenome theory, has been introduced. In this paper, we aim to develop a new ...