Jong Youl Choi

About Me

Jong Youl Choi is a researcher working in Scientific Data Group, Computer Science and Mathematics Division, Oak Ridge National Laboratory (ORNL), Oak Ridge, Tennessee, USA. He earned his Ph.D. degree in Computer Science at Indiana University Bloomington in 2012 and his MS degree in Computer Science from New York University in 2004.

His areas of research interest span data mining and machine learning algorithms, high-performance data-intensive computing, parallel and distributed systems for Cloud and Grid computing. More specifically, he is focusing on developing high-performance data mining algorithms and researching efficient run-time environments in Cloud and Grid systems.

During his Ph.D. study, he was a member of SALSA HPC group in Pervasive Technology Institute, supervised under Professor Geoffrey Fox.

Education

2004 – 2012
Indiana University, Bloomington, Indiana, USA.
Ph.D. candidate in Computer Science
Thesis : Unsupervised learning of finite mixture models with deterministic annealing
for large-scale data analysis
Advisor : Professor Geoffrey Fox

2002 – 2004
New York University, New York, New York, USA
M. Sc. in Computer Science

1992 – 1998
Hanyang University, Seoul, South Korea
B. Sc. in Industrial Chemistry (a.k.a, Chemical Engineering)

 

Experience

2012 – Present
HPC Data Research Scientist, Oak Ridge National Laboratory, Oak Ridge, Tennessee
Group: Scientific Data Group, Computer Science and Mathematics Division
Group Leader: Dr. Scott Klasky

2012
Post-doc, University of Tennessee Knoxville/Oak Ridge National Laboratory, Oak Ridge, Tennessee
Department: Scientific Data Sciences Group, Computer Science and Mathematics Division
Advisor : Dr. Scott Klasky

2006 – 2012
Research Assistant, Indiana University, Bloomington, Indiana
Department : Pervasive Technology Institute
Advisor : Professor Geoffrey Fox

Jun. 2007 – Sep. 2007
Research Internship, Microsoft Research, Redmond, Washington
Department : Technical Computing @ Microsoft group
Advisor : Dr. Savas Parastatidis

Oct. 2005 – Jan. 2006
Research Internship, Ecole Polytechniques Federales de Lausanne (EPFL), Lausanne, Switzerland
Department : Computer Communications and Applications Laboratory
Advisor : Professor Jean-Pierre Hubaux

Sep. 2004 – Sep. 2005
Research Assistant, Indiana University, Bloomington, Indiana
Department : Computer Science
Advisor : Professor Markus Jakobsson

Jun. 2003 – Aug. 2003
Research Internship, New York University, New York, New York
Department : MIS Department of Stern School of Business
Advisor : Professor Shinkyu Yang

Research

Mixture Model with Deterministic Annealing (2011 – present)

Solving mixture model problems with the Deterministic Annealing optimization method
A mixture model problem is to find an optimal mixture distribution of conditional probabilities and it is very common in many data mining areas, such as text ming, image processing, speech recognition, to name a few. However, traditional solutions largely depend on the EM method which can only find local solutions. With the novel Deterministic Annealing method, I am currently researching on solving Probabilistic Latent Semantic Indexing (PLSI), one of mixture model problems, in order to find global solutions as well as finding model parameters in an adaptive way.

DA-GTM (2009 – present)

Generative Topographic Mapping and Deterministic Annealing optimization
Developed the high-performance parallel Generative Topographic Mapping (GTM), an algorithm for dimension reduction and visual data mining, as a data-intensive application for large and high-dimensional life science data analysis. To overcome a local optimum problem appearing in the conventional GTM, applied an innovative global optimization method, Deterministic Annealing (DA), and devised a new DA-GTM algorithm [CCPE '11, CCPE '11, BMC Informatics '10, HPDC '10, ECMLS '10, ICCS '10, CCGRID '10].

Developed PlotViz, a visualization tool for large and high-dimensional data, as an application for data-intensive life science data analysis. The program is available for downloading from here

Folksonomy Mining (2008)

Collaborative tagging system for folksonomy mining
Developed Collective Collaborative Tagging (CCT) system for folksonomy data mining to build a recommendation and searching system. Also, performed analysis of folksonomy data by exploiting graph structures of tagging [CCPE '09, GCE '08, CTS '08].

V-Lab (2007)

Virtual labs in Clouds : V-Lab-Protein and V-Lab-Microarray
Developed virtual collaborative labs in Clouds for bioinformatics applications, Virtual Collaborative Lab for Protein Sequence Analysis (V-Lab-Protein) and Microarray Data Analysis (V-Lab-Microarray). Designed V-Labs to provide virtual and volatile cloud computing resources by using Amazon's EC2 and S3 services. Equipped with workflow engines along with a user-friendly graphic workflow composer, V-Labs provide easy-to-use cloud computing environments to bioinformatics researchers [eScience '08, BIBM '07].

This project is also known as one of the first scientific applications using cloud computing infrastructures. It has been Introduced in Nature's News and Tony Hey's talk. A demo movie is available from here (Courtesy of Youngik Yang).

PRECIP and Spyshield (2007)

Privacy protection from spyware
Implemented two anti-spyware architectures, PRECIP and Spyshield, to protect user privacy. Developed Spyshield as a Browser Helper Object (BHO), also known as a toolbar plug-in, of the Internet Explorer [RAID '07, NDSS '08].

Packet Vaccine (2006)

Malware detection and signature generation by using a black-box approach
Implemented a system for malicious packet detection by using a packet-based black-box approach. The system can identify malicious codes inside a network packet for fast detection [TISSEC '08, CCS '06].

Tamper-evident schemes (2005)

Tamper-evident signatures and mix networks
Developed a tamper-evident signature scheme and a mix network. A new temper-evident signature scheme can allow a user to detect whether a signer is corrupted or not. A temper-evident mix network is also proposed to protect the system from malicious attempts [DASC '06, FC '06].

Vehicular Network Protocol (2005)

Secure protocol designs for vehicular networks
Designed a secure vehicular network protocol to achieve both auditability and privacy [Q2SWinet '05].

VPN (2004)

Performance study for Virtual Private Network (VPN)
Research on IPSec overheads for VPN servers [ICNP '05].

 

Journal Publications

CCPE '11
J. Y. Choi, S.-H. Bae, J. Qiu, B. Chen, and D. Wild. Browsing large scale cheminformatics data with dimension reduction. Concurrency and Computation: Practice and Experience, Accepted. 2011.
CCPE '11
T. Gunarathne, S. Wu, J. Y. Choi, S.-H. Bae, and J. Qiu. Cloud computing paradigms for pleasingly parallel biomedical applications. Concurrency and Computation: Practice and Experience, Accepted. 2011.
BMC '10
J. Qiu, J. Ekanayake, T. Gunarathne, J. Choi, S.-H. Bae, H. Li, B. Zhang, T.-L. Wu, Y. Ruan, S. Ekanayake, A. Hughes, and G. Fox. Hybrid cloud and cluster computing paradigms for life science applications. BMC Bioinformatics, 11(Suppl 12):S3, 2010.
CCPE '09
M. Pierce, G. Fox, J. Choi, Z. Guo, X. Gao, and Y. Ma, "Using Web 2.0 for scientific applications and scientific communities," Concurrency and Computation: Practice and Experience, vol. 21, no. 5, pp. 583–603, 2009.
TISSEC '08
X. Wang, Z. Li, J. Choi, J. Xu, M. Reiter, and C. Kil, "Fast and black-box exploit detection and signature generation for commodity software," ACM Transactions on Information and System Security (TISSEC), vol. 12, no. 2, p. 11, 2008.

Selected Conference/Workshop Publications

eScience '12
J. Y. Choi, H. Abbasi, D. Pugmire, N. Podhorszki, S. Klasky, C. Capdevila, M. Parashar, M. Wolf, J. Qiu, and G. Fox, “Mining hidden mixture context with ADIOS-P to improve predictive pre-fetcher accuracy,” in IEEE Fourth International Conference on eScience, 2012.
HPDC '10
S.-H. Bae, J. Y. Choi, J. Qiu, and G. C. Fox, "Dimension reduction and visualization of large high-dimensional data via interpolation," in HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, (New York, NY, USA), pp. 203–214, ACM, 2010.
ECMLS '10
J. Y. Choi, S.-H. Bae, J. Qiu, G. Fox, B. Chen, and D. Wild, "Browsing large scale cheminformatics data with dimension reduction," in HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, (New York, NY, USA), pp. 503–506, ACM, 2010.
ICCS '10
J. Y. Choi, J. Qiu, M. Pierce, and G. Fox, "Generative Topographic Mapping by Deterministic Annealing," in Proceedings of the 10th International Conference on Computational Science and Engineering (ICCS 2010), 2010.
CCGRID '10
J. Choi, S. Bae, X. Qiu, and G. Fox, "High Performance Dimension Reduction and Visualization for Large High-dimensional Data Analysis," proceedings of CCGRID 2010, 2010.
GCE '08
J. Choi, J. Rosen, S. Maini, M. Pierce, and G. Fox, "Collective Collaborative Tagging System," in Grid Computing Environments Workshop, 2008. GCE'08, pp. 1–7, 2008.
CTS '08
M. Pierce, G. Fox, J. Rosen, S. Maini, and J. Choi, "Social networking for scientists using tagging and shared bookmarks: a Web 2.0 application," in Collaborative Technologies and Systems, 2008. CTS 2008. International Symposium on, pp. 257–266, 2008.
eScience '08
Y. Yang, J. Choi, K. Choi, M. Pierce, D. Gannon, and S. Kim, "BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment," in IEEE Fourth International Conference on eScience, 2008.
BIBM '07
J. Choi, Y. Yang, S. Kim, and D. Gannon, "V-lab-protein: Virtual collaborative lab for protein sequence analysis," in IEEE Workshop on High-Throughput Data Analysis for Proteomics and Genomics, Workshop at BIBM 2007, 2007.
RAID '07
Z. Li, X. Wang, and J. Choi, "Spyshield: Preserving privacy from spy add-ons," LECTURE NOTES IN COMPUTER SCIENCE, vol. 4637, p. 296, 2007.
NDSS '08
X. Wang, Z. Li, N. Li, and J. Choi, "Precip: Towards practical and retrofittable confidential information protection," in Network and Distributed System Security Symposium (NDSS), 2008.
CCS '06
X. Wang, Z. Li, J. Xu, M. Reiter, C. Kil, and J. Choi, "Packet vaccine: Black-box exploit detection and signature generation," in Proceedings of the 13th ACM conference on Computer and communications security, p. 46, ACM, 2006.
DASC '06
J. Y. Choi, P. Golle, and M. Jakobsson, "Tamper-evident digital signature protecting certification authorities against malware," DASC, vol. 0, pp. 37–44, 2006.
FC '06
J. Choi, P. Golle, and M. Jakobsson, "Auditable privacy: On tamper-evident mix networks," Lecture Notes in Computer Science, vol. 4107, p. 126, 2006.
Q2SWinet '05
J. Choi, M. Jakobsson, and S. Wetzel, "Balancing auditability and privacy in vehicular networks," in Proceedings of the 1st ACM international workshop on Quality of service & security in wireless and mobile networks, p. 87, ACM, 2005.
ICNP '05
C. Shue, Y. Shin, M. Gupta, and J. Choi, "Analysis of IPSec overheads for VPN servers," in IEEE ICNP's NPSec Workshop, 2005.

Software

DA-PLSA

Probabilistic Latent Semantic Analysis is an algorithm for text mining. It is based on a topic model. We apply a novel optimization method, called Deterministic Annealing (DA), to solve the overfitting problem which the original PLSA suffers from. More details are available from here.

PlotViz

Generative Topographic Mapping (GTM) is an algorithm for data visualization through dimension reduction. Unlike PCA, which is a traditional visualization method based on linear algebra, GTM seeks a non-linear mapping. For its information theory-based background, GTM finds more separable map than PCA. The GTM problem is basically Gaussian mixture model problem and a standard method to solve this problem is Expectation-Maximization (EM) method.
We apply a novel optimization method, called Deterministic Annealing (DA), to solve the local optimum problem which the original GTM can suffer from. More details are available from here.

PlotViz

Developed PlotViz, a visualization tool for large and high-dimensional data, as an application for data-intensive life science data analysis. The program is written in C++ by using VTK and Qt and available for downloading from here.

Interactive R

By extending Rmpi (version 0.5-7), a plug-in for enabling MPI for R, I have developed an interactive R for managing multiple compute nodes in R. A windows binary is available for downloading from here.

Contact

Office

Oak Ridge National Laboratory

One Bethel Valley Road
P.O. Box 2008, MS-6290
Oak Ridge, TN 37831-6290

Office: (865) 241-1436
Mobile: (812) 606-8435

E-mail :