In this tutorial we will give an introduction to two advanced data storage formats. HDF5 and NetCDF were designed to efficiently store the results of supercomputing applications like climate model outputs, or the data streams received from NASA's fleet of earth observing satellites. They provide a lot of optimizations concerning transparent file compression, speed of access or working with multiple files as if it were one large data set.A couple of Python libraries exist that allow fast and pythonic access to these formats.We will show you how to create and access these types of files from Python, and how to use their advanced features to tune them for maximum efficiency.Tutorial prerequisites and instructions.
This presentation was recorded at GOTO Berlin 2015
Kevin Goldsmith - Vice President, Engineering at Spotify
The software industry used to be all about building monoliths: monolithic applications and services, with bing-bang product releasees. All that has now changed [...]
Download slides and read the full abstract here:
The behavior of names and values in Python can be confusing. Like many parts of Python, it has an underlying simplicity that can be hard to discern, especially if you are used to other programming languages. Here I'll explain how it all works, and present some facts and myths along ...
Google Tech Talk (more info below)
October 14, 2011
Presented by Mark Lentczner.
Want to know a little more about programming Haskell than just the buzz-words? This talk will show you some of the joys coding in Haskell through lots and lots of code examples.
No prior experience with Haskell or functional programming required. ...
PyData SV 2014
Andrew Odlyzko, Professor of Mathematics at the University of Minnesota, discusses "Turing and the Riemann zeta function" in a lecture given on the occasion of Princeton University's centennial celebration of Alan Turing. Learn more at www.princeton.edu/turing
PyData SV 2014
Many real-world datasets have missing observations, noise and outliers; usually due to logistical problems, component failures and erroneous procedures during the data collection process. Although it is easy to avoid missing points and noise to some level, it is not easy to detect wrong measurements and outliers ...