Asif M. Khan: 7) Duplicate Detection in Biological Data using Association.

Koh JLY, Lee ML, Khan AM, Tan PTJ, Brusic V
2nd European Workshop on Data Mining and Text Mining for Bioinformatics.
Pisa, Italy, September 24, 2004.
Out link: Full-text
Impact Factor Year 2009: NA
No. of Citations: 19 (total): 16 (non-self) & 3 (self)

ABSTRACT :

Recent advancement in biotechnology has produced a massive amount of raw biological data which are accumulating at an exponential rate. Errors, redundancy and discrepancies are prevalent in the raw data, and there is a serious need for systematic approaches towards biological data cleaning. This work examines the extent of redundancy in biological data and proposes a method for detecting duplicates in biological data. Duplicate relations in a real-world biological dataset are modeled
into forms of association rules so that these duplicate relations or rules can be induced from data with known duplicates using association rule mining. Our approach of using association rule induction to find duplicate relations is new. Evaluation of our method on a real-world dataset shows that our duplicate association rules can accurately identify up to 96.8% of the duplicates in the dataset at the accuracy of 0.3% false positives and 0.0038% false negatives.

This article has been cited by other articles:

1) McCallum, A, Bellare, K, Pereira, F. A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance. Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI 2005). http://www.seas.upenn.edu/~strctlrn/bib/PDF/crfstredit.pdf

2) Jakoniene, V, Rundqvist, D, Lambrix, P. A method for similarity-based grouping of biological data. Lecture Notes in Computer Science Series. In the book Data Integration in the Life Sciences. Page: 136-151.

3) David Rundqvist. Grouping Biological Data. PhD Thesis. Dept. of Computer and Information Science at Link¨oping University. http://www.diva-portal.org/diva/getDocument?urn_nbn_se_liu_diva-6327-1__fulltext.pdf

4) Veeramani, A, Gopalakrishnan, K, Brusic, V, Koh, JLY. Biodart - Catalogue of Biological Data Artifact Examples. International Conference on Biomedical and Pharmaceutical Engineering, 2006. 2006 Page(s):c1 - xiii. http://antigen.i2r.a-star.edu.sg/ani/textmin/kddinfo/pdf/261_1.pdf

5) Zhu YY, Xiong Y. DNA sequence data mining technique. Journal of Software, 2007,18(11):2766-2781. http://www.jos.org.cn/1000-9825/18/2766.htm

6) Judice, Lie Yong Koh. Correlation-Based Methods for Biological Data Cleaning. PhD thesis. 2007, National University of Singapore

7) van Grootel RJ, van der Bilt A, van der Glas HW. Long-term reliable change of pain scores in individual myogenous TMD patients. Eur J Pain. 2007 Aug;11(6):635-43. Epub 2006 Nov 22. PMID: 17118682

8) Arraes LC, de Souza PR, Bruneska D, Castelo Filho A, Cavada Bde S, de Lima Filho JL, Crovella S. A cost-effective melting temperature assay for the detection of single-nucleotide polymorphism in the MBL2 gene of HIV-1-infected children. Braz J Med Biol Res. 2006 Jun;39(6):719-23. Epub 2006 Jun 2. PMID: 16751976

9) Szalma SJ, Buckler ES 4th, Snook ME, McMullen MD.
Association analysis of candidate genes for maysin and chlorogenic acid accumulation in maize silks. Theor Appl Genet. 2005 May;110(7):1324-33. Epub 2005 Apr 2. PMID: 15806344

10) Ferguson M, Heath A. Report of a collaborative study to calibrate the Second International Standard for parvovirus B19 antibody. Biologicals. 2004 Dec;32(4):207-12. PMID: 15572102

11) D Apiletti, G Bruno, E Ficarra, E Baralis. Data Cleaning and Semantic Improvement in Biological Databases. Journal of Integrative Bioinformatics, 2006 - imbio.de

12) http://www.cqvip.com/qk/96857x/2007011/25789686.html

13) A classification of biological data artifacts

JLY Koh, ML Lee, V Brusic

Workshop on Database Issues in Biological Databases

January 8-9, 2005

National e-Science Centre, Edinburgh, UK

Organized by the European Bioinformatics Institute & Edinburgh Database Group

homepages.inf.ed.ac.uk

14) A tool for evaluating strategies for grouping of biological data

V Jakoniene, P Lambrix - Journal of Integrative Bioinformatics, 2007

15) Quantitative association rules mining

F Karel - Proc. 10th Int. Conf. Knowl.-Based Intell. Inf. Eng. Syst, 2006

16) Detecting duplicate biological entities using Markov random field-based edit distance

M Song, A Rudniy

Bioinformatics and Biomedicine, 2008 (BIBM '08)

ieeexplore.ieee.org

17) A Comprehensive Review of Significant Researches on Duplicate Record Detection in Databases

Deepa K & Rangarajan R

Advances in Computational Sciences and Technology. 2009; 2(2)

18) Extraction of Constraints from Biological Data

Daniele Apiletti, Giulia Bruno, Elisa Ficarra and Elena Baralis

BOMEDICAL DATA AND APPLICATIONS

Studies in Computational Intelligence, 2009, Volume 224/2009, 169-186, DOI: 10.1007/978-3-642-02193-0_7

19) C Markschies

Describing Differences between Overlapping Databases

PhD Thesis, 2008

edoc.hu-berlin.de

Humboldt-Universität zu Berlin:

Asif M. Khan

7) Duplicate Detection in Biological Data using Association.

No comments:

My Sites

Asif M. Khan

7) Duplicate Detection in Biological Data using Association.

No comments:

Subscribe To

My Sites