Applied Data Science @ Columbia


Learning by doing

“Applied Data Science” (class repo) is a project-based learning (PBL) course that incorporates knowledge and skills covered in a statistical curriculum with topics and projects in data science. Programming will be covered using existing tools in R, while students can use tools from other languages. Computing best practices will be taught using test-driven development, version control, and collaboration. Students finish the class with a portfolio on GitHub, and deeper understanding of several core statistical/machine-learning algorithms. As a project-based hands-on course in data science, no formal instruction on statistics, data science, machine learning will be given. Project cycles run every 2-3 weeks, where we will have mini data projects. Groups will be formed randomly and project products will be peer-reviewed.

About the instructor

<img align=”right” src=”http://datascience.columbia.edu/files/seasdepts/lm2963@columbia.edu/person/person_images/Tian_Zheng_300x400.jpg”, width=”150”> Tian Zheng is Professor of Statistics at Columbia University. At the Data Science Institute of Columbia University, Professor Zheng is the Associate Director for Education and chair of the Education committee.

She obtained her PhD from Columbia in 2002. Her research is to develop novel methods and improve existing methods for exploring and analyzing interesting patterns in complex data from different application domains. Her current projects are in the fields of statistical genetics, bioinformatics and computational biology, feature selection and classification for high dimensional data, and network analysis. Especially, Dr. Zheng have been developing statistical and computational tools for high dimensional data, searching for genetic interactions associated with complex human disorders, quantifying social structure and studying hard-to-reach populations using survey questions, with more than 60 peer-reviewed publications in journals including JASA, AOAS and PNAS. Her work was recognized with the 2008 Outstanding Statistical Application Award from the American Statistical Association, The Mitchell Prize from ISBA and a Google research award.