The HMEC Principle: Finding the Sweet Spot for Generative AIGenerative AI (such as language models for generating text and diffusion models for generating images and videos) has taken the world by…Mar 17Mar 17
Create pandas DataFrame in a loopPandas has many ways to read data — from CSVs, JSONs, databases etc. However, every so often we need to create a new DataFrame row by row…Nov 9, 20202Nov 9, 20202
Making GitHub-hosted datasets discoverable by Google Dataset SearchEven though GitHub is best known for sharing and collaborating on source code it has also been used to share datasets. For small datasets…Jul 18, 2019Jul 18, 2019
Liberating data — an interview with John IoannidisCouple of weeks ago I had the pleasure to sit down with Prof. John Ioannidis to talk about the role of data in science. Prof. Ioannidis is…Jun 26, 2018Jun 26, 2018
The glass box design philosophyThere is an interesting paradox in context of developing data analysis software. On one side, there are clear benefits of designing tools…Jan 26, 2018Jan 26, 2018
2017: Research SummaryEven though the passing of the year is more or less an arbitrary date it’s a good opportunity to give a status update on various activities…Jan 1, 20182Jan 1, 20182
To pin or not to pin dependencies: reproducible vs. reusable softwareWe recently had a very interesting conversation in our lab about how to describe software dependencies (libraries one needs to install) for…Dec 4, 20171Dec 4, 20171
Forever free: building non-profit online services in a sustainable wayIn the past decade, we have seen a big switch from client run software to online services. Some services such as scientific data…Dec 1, 2017Dec 1, 2017
Sharing academic credit in an open source projectWe live in truly wonderful times to develop software. Thanks to the growth of the Open Source movement and emergence of platforms such as…Nov 28, 2017Nov 28, 2017