CETlab Command Line Interface App

Automating the Data Wrangling Process for Energy Demand Data (load)

Authors & Acknowledgment

Thank you to the Clean Energy Transformation Lab at the University of California Santa Barbara, and the Prinicpal Investigator Dr. Ranjit Deshmukh

Overview

This program is a command line interface application (CLI app), developed in Python to automate data wrangling transformations by cleaning and structuring energy demand (load) data from electricity agencies in 10 Southern African countries. The CLI app supports the data analytics and data pipeline development for the UC Santa Barbara, Bren School, Clean Energy Transformation lab (CETlab) research team.

The CETlab application was completely designed, developed, and maintain by the software engineer and data scientist Tiana Curry. This CLI app is an example of a complete project workflow from problem, idea to deployment. It is a full stack, Object-Oriented Program, Python application that solves common organization issues in data pipeline development used in the university and academic research sector.

Description

Goals

Research Goal

This program is a command line interface application (CLI app) designed to clean datasets for the UC Santa Barbara, Bren School, Clean Energy Transformation lab (CETlab) and GridPath software. The CETlab works in collaboration with specific electricity agencies and other stakeholders throughout Southern Africa to provide sustainable renewable energy. The CETlab and GridPath goal is to provide reliable renewable energy to 10+ countries throughout Africa, to support their growing economies and increase access to essential resources; while leading the way to clean energy use for large populations.

Project Goal

The goal of the CETlab CLI app is to automate the data wrangling, cleaning and structuring process for datasets coming from the specific electricity agencies. By providing a simple application where users can simply input their raw datasets, and the CLI app will output a clean tabular data frame, perfect for further analysis, and predictive/machine learning models. The CLI app provides data wrangling functions for electricity companies in the following Southern African countries: Angola, Eswatini, Lesotho, Malawi, Mozambique, Namibia, South Africa, Zambia, Zimbabwe. Each electricity company had different ways of collecting, storing, and documenting their data, the solution was to provide different functions for each country tailored to each electricity agencies data collection methods. The results of the application were an increase in efficiency and accuracy in analytics, initial development of the data pipeline, and a contribution of a reusable and reproducible CLI app utilize by the entire CETlab.

Using This App

Data

The data collected by the electricity agencies then used by the CETlab is not accessible to the public. So, a mock dataset was created, filled with random data points, to show functionality of the CLI app. The test input dataset and a version of the output clean data frame is provided to specifically show how the Eswatini demand data (load) was handled and structured in this application. You can find the mock input dataset and output data frame results in the test-data directory.

Usage

To use this CLI app you can download this repository to your local computer, create a virtual environment, and install the dependencies. Below you can find a quick “How-to Guide” to learn how to use, access, and run commands to clean raw datasets using the CETlab CLI app. There are data wrangling commands for all 10 countries and a set of simple test functions to see the basic functionality of this program. To only view the data wrangling functions, you can access this information in the Data Wrangling Functions Jupyter notebook. To view the complete source code for the CETlab CLI app, you can view this code in the src directory. The src directory includes two main scripts the re_cli.py script for command line functionality and the re_func.py script for data wrangling functions.

How to Guide

Installation

Dependencies

Python==3.12.3
code==1.91.1

You can find the following dependencies in the requirements.txt file

DateTime==5.5
et-xmlfile==1.1.0
numpy==2.0.1
openpyxl==3.1.5
pandas==2.2.2
python-dateutil==2.9.0.post0
pytz==2024.1
setuptools==72.2.0
six==1.16.0
tzdata==2024.1
zope.interface==7.0.1

Modules

The virtual environment I created was "cetlab-cli"; and I directly imported the following libraries:

numpy
pandas
os
sys
argparse
datetime

I built two modules

re_cli: creates commandline application functionality
re_func: holds data wrangling functions

Example

Step 1

Check if your raw data fits the strict format structure for the functions. There are four main things to check

the column names
data enteries contain a load demand per half hour
the number of rows
Date format
the name for each sheet.

a. Make sure you have four columns with the following column names 'WEEK', 'DAY', 'DATE', 'TIME', 'SYST'.

b. Starting from Jan 1, 20xx to Dec 31, 20xx, there should be an entry for every every half hour for every day of the year in the following format

c. The number of rows should be 17521

d. The 'DATE' column is day/month/year in the format dd/mm/yyyy

e. The sheet name should be a four digit year.

Step 2

Start with the 'help' command to view all command options and for momre details on the functions

% ./re_cli.py -h

Step 3

We can see the command options with a more detailed description, to use the mock dataset we will use the function with the command -sz --eswatini

Step 4

Run command using mock data. We'll use the function by country two letter code name to run the command

Syntax: % ./re_cli.py <file-name> -<county-code>

Step 5

Follow prompt (if provided). For the Eswatini function the prompt will be the enter the year of the Excel file sheet

Step 6

Three prompts will popup if your entry is validated, indicating the data cleaning process has began. And you can locate output dataframe in current working directory under the format <input-file-name>-dataframe.csv

Step 7

View final clean tabular dataframe, ready for further analysis. Each dataframe has the same format for all countries with the following columns: 'hour', 'day', 'month', 'year', 'system_demand[mw]'

License

This project is MIT licensed

First published: 2019; last updated: 2024

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
__md-pics		__md-pics
src		src
test-data		test-data
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CETlab Command Line Interface App

Table of Contents

Authors & Acknowledgment

Overview

Description

Goals

Research Goal

Project Goal

Using This App

Data

Usage

How to Guide

Installation

Dependencies

Modules

Example

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7

License

About

Releases

Packages

Languages

License

TianaCurry/cetlab-cli-app

Folders and files

Latest commit

History

Repository files navigation

CETlab Command Line Interface App

Table of Contents

Authors & Acknowledgment

Overview

Description

Goals

Research Goal

Project Goal

Using This App

Data

Usage

How to Guide

Installation

Dependencies

Modules

Example

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages