Course Name: Scalable Data Science

Course abstract

One is interested in computing summary statistics (word count distributions) for a set of words which occur in the same document in entire Wikipedia collection (5 million documents). Naive techniques, will run out of main memory on most computers One needs to train an SVM classifier for text categorization, with unigram features for hundreds of classes. One would run out of main memory, if they store uncompressed model parameters in main memory One is interested in learning either a supervised model or find unsupervised patterns, but the data is distributed over multiple machines. Communication being the bottleneck, naïve methods to adapt existing algorithms to such a distributed setting might perform extremely poorly. In all the above situations, a simple data mining/machine learning task has been made more complicated due to large scale of input data, output results or both. In this course, we discuss algorithmic techniques as well as software paradigms which allow one to develop scalable algorithms and systems for the common data science tasks

Course Instructor

Media Object

Anirban Dasgupta

Anirban Dasgupta is currently an Associate Professor of Computer Science & Engineering at IIT Gandhinagar. Prior to this, he was a Senior Scientist at Yahoo! Labs Sunnyvale. Anirban works on algorithmic problems for massive data sets, large scale machine learning, analysis of large social networks and randomized algorithms in general. He did his undergraduate studies at IIT Kharagpur and doctoral studies at Cornell University. He has also received the Google Faculty Research Award (2015), the Cisco University grant (2016), and the ICDT Best Newcomer Award (2016).
More info
Media Object

Sourangshu Bhattacharya

Sourangshu Bhattacharya is an Assistant Professor in the Department of Computer Science and Engineering, IIT Kharagpur. He was a Scientist at Yahoo! Labs from 2008 to 2013, where he was working on prediction of Click-through rates, Ad-targeting to customers, etc on the Rightmedia display ads exchange. He was a visiting scholar at the Helsinki University of Technology from January - May 2008. He received the B.Tech. in Civil Engineering from I.I.T. Roorkee in 2001, M.Tech. in computer science from I.S.I. Kolkata in 2003, and Ph.D. in Computer Science from the Department of Computer Science & Automation, IISc Bangalore in 2008. He has many publications in top conferences and journals, including ICML, NIPS, WWW, ICDM, CIKM, etc. His current research interests include modeling influence in social networks, distributed machine learning, and representation learning.
More info

Teaching Assistant(s)

No teaching assistant data available for this course yet
 Course Duration : Aug-Sep 2018

  View Course


 Enrollment : 18-Apr-2018 to 06-Aug-2018