×

Warning!

Gantry 5 Component: Cannot display content; not in Gantry 5 template!

Course catalogue

Create your own master’s programme by choosing between the different specializations of our partner universities.

string(4) "1087"

Master SERP+ Programme - cohort 2020-2025

Introduction to Data Sciences (3 ECTS)

Hours: (Lecture / Tutorial / Practical courses)

Lectures (10 h)

  1. Introduction to Data Science, data collection, processing, analysis and archiving
  2. API (Application Programming Interface) – a powerful data mining tool
  3. Reproducibility of published experimental results, Open Science and FAIR
  4. Big Data - brief overview of issues and challenges

 

Practical laboratory exercises (12 h)

  1. Hands-on use of databases and experimental data repositories
  2. Examples of shared data applications in chemistry and structural biology
  3. Practical use of the programmatic API implemented in various databases and web-servers
  4. Applications of big data in practice, machine learning in chemistry and structural biology
  5. Reprocessing and re-use of raw experimental data

The covered topics include the characteristics and operations associated with the creation, gathering, and use of research data. The following issues will be presented:

  • How to handle raw data obtained from experiments (e.g. synchrotron X-ray diffraction, NMR spectroscopy, cryo-electron microscopy)?
  • Required procedures to get a working dataset, the first stage of interpretation
  • Overview of methods for analyzing experimental data
  • Storing, retrieving and sharing data
  • Reproducibility, validation and re-using data (Open Science and FAIR initiatives)

Students will learn in practice the methods and techniques that are necessary to exploit vast amounts of research data and to extract information from large heterogeneous datasets or use them in machine learning protocols. During hands-on laboratory exercises, students will be introduced to selected real scientific problems. They will prepare dedicated software queries to mine data from repositories and databases, learn how to process and archive experimental data, and how to improve search efficiency and precision.

Pre-requisites:

  • Familiarity with the basic math and statistic concepts.
  • Curiosity about research and playing with data.

 

 Hours Lectures 12 h Laboratory 9 h

Teaching Staff:  Miroslaw Gilski
Hours: 22 hours

Grading system in % (homework, oral presentation, lab training, final exam)

Recommended books & articles

  1. Introduction to Data Science, Jeffrey S. Saltz and Jeffrey M. Stanton, Sage Publ. (2017).
  2. An Introduction to Data, Everything You Need to Know About AI, Big Data and Data Science. Francesco Corea, Springer Nature, (2019).
  3. A public database of macromolecular diffraction experiments, M. Grabowski, K.M. Langner, M. Cymborowski, P.J. Porebski, P. Sroka, H. Zheng, D.R. Cooper, M.D. Zimmerman, M. Elsliger, S.K. Burleyd and W. Minor, Acta Cryst. D72, 1181-1193, (2016).
  4. Data sharing in structural biology: Advances and challenges, M. Grabowski et al.,  in Data Sharing: Recent Progress and Remaining Challenges-Computer Science, Technology and Applications (Nova Science Publishers), pp. 29–68, (2019).