Last week I found out that a conference paper I helped co-author with Christoph Klemenjak, Andreas Reinhardt, Lucas Pereira, Mario Bergés, and Wilfried Elmenreich was accepted for presentation and publication. Titled Electricity Consumption Data Sets: Pitfalls and Opportunities, it will be presented at the 6th ACM International Conference on Systems for Energy-Efficient Built Environments, Cities, and Transportation (BuildSys) held November 13-14, 2019 in New York. Here is the paper abstract:
Real-world data sets are crucial to develop and test signal processing and machine learning algorithms to solve energy-related problems. Their scope and data resolution is, however, often limited to the means required to fulfill the experimenters’ objectives and moreover governed by personal experience, budgetary and time constraints, and the availability of equipment. As a result, numerous differences between data sets can be observed, e.g., regarding their sampling rates, the number of sensors deployed, their amplitude resolutions, storage formats, or the availability and extent of ground-truth annotations. This heterogeneity poses a significant problem for researchers intending to comparatively use data sets because of the required data conversion, re-sampling, and adaptation steps. In short, there is a lack of widely agreed best practices for designing, deploying, and operating electrical data collection systems. We address this limitation by dissecting the collection methodologies used in existing data sets. By offering recommendations for data collection, data storage, and data provision, we intend to foster the creation of data sets with increased usability and comparability, and thus a greater benefit to the community.
Keywords: energy consumption data sets, data heterogeneity, best practices