The emergence of massive data collections (i.e. “Big Data”) has ushered a paradigm shift in the way scientific research is conducted and new knowledge is discovered. Traditional observe-hypothesis-test model of small-scale scientific endeavor is increasingly augmented and in some cases supplanted with collaborative scientific research applying complex patterns of data integration and analysis involving multi-disciplinary teams from distributed organizations brought together to solve a common problem.

Emerging cyber-infrastructure solutions necessitate addressing the needs of domain scientists from multiple angles, including data access, metadata management, large-scale analytics and workflows, data and application discovery and sharing, and data preservation. The aim of the Cyber Carpentry workshop is to make it easier for participants to learn all aspects of the data-intensive computing environment, and more importantly, to work together with other researchers with complementary expertise: domain scientists with computer and information scientists.

This two-week workshop will provide doctoral students and post-doctoral researchers with an overview of best data management practices, data science tools, and concrete steps and methods for performing end-to-end data intensive computing and data life-cycle management. Training will prepare participant to facilitate and promote reproducible science and data reuse.

The workshop will convene at the University of North Carolina at Chapel Hill from July 16-27, 2018. Travel and accommodation support will be provided for accepted participants, and a certificate of completion from the UNC School of Information and Library Science will be awarded at the end of the training.

Workshop topics will include concepts and practices in:

  • Data life-cycle management and policy automation for increasing sustainability
  • Data and metadata curation for effective data preservation
  • Metadata, ontology and provenance for increasing interoperability
  • Concepts in federation for effective collaboration and sharing
  • Abstraction, virtualization and containerization for reproducible science
  • Effective collaboration techniques
  • Information analytics and scientific workflows
  • Computation using clouds and cluster resources


The tentative agenda for the workshop can be found here

Even though we have the word Carpentry in our workshop title, we are not affiliated with Data Carpentry;  suffice to say that we are inspired by them.