Skip to content

This program is a command line interface application (CLI app), developed in Python to automate data wrangling transformations by cleaning and structuring energy demand (load) data from electricity agencies in 10 Southern African countries. The CLI app supports the data analytics and data pipeline development for the UC Santa Barbara CETlab

License

Notifications You must be signed in to change notification settings

TianaCurry/cetlab-cli-app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CETlab Command Line Interface App

Automating the Data Wrangling Process for Energy Demand Data (load)

Table of Contents

  1. Authors & Acknowledgment
  2. Overview
  3. Description
    • Goals
      • Research Goals
      • Project Goals
    • Using this app
      • Data
      • Usage
  4. How-to Guide
    • Installation
      • Dependencies
      • Modules
    • Using Commands
      • Steps: 1-7
  5. License

Authors & Acknowledgment

Author: Tiana Curry

Thank you to the Clean Energy Transformation Lab at the University of California Santa Barbara, and the Prinicpal Investigator Dr. Ranjit Deshmukh

Overview

This program is a command line interface application (CLI app), developed in Python to automate data wrangling transformations by cleaning and structuring energy demand (load) data from electricity agencies in 10 Southern African countries. The CLI app supports the data analytics and data pipeline development for the UC Santa Barbara, Bren School, Clean Energy Transformation lab (CETlab) research team.

The CETlab application was completely designed, developed, and maintain by the software engineer and data scientist Tiana Curry. This CLI app is an example of a complete project workflow from problem, idea to deployment. It is a full stack, Object-Oriented Program, Python application that solves common organization issues in data pipeline development used in the university and academic research sector.

Description

Goals

Research Goal

This program is a command line interface application (CLI app) designed to clean datasets for the UC Santa Barbara, Bren School, Clean Energy Transformation lab (CETlab) and GridPath software. The CETlab works in collaboration with specific electricity agencies and other stakeholders throughout Southern Africa to provide sustainable renewable energy. The CETlab and GridPath goal is to provide reliable renewable energy to 10+ countries throughout Africa, to support their growing economies and increase access to essential resources; while leading the way to clean energy use for large populations.

Project Goal

The goal of the CETlab CLI app is to automate the data wrangling, cleaning and structuring process for datasets coming from the specific electricity agencies. By providing a simple application where users can simply input their raw datasets, and the CLI app will output a clean tabular data frame, perfect for further analysis, and predictive/machine learning models. The CLI app provides data wrangling functions for electricity companies in the following Southern African countries: Angola, Eswatini, Lesotho, Malawi, Mozambique, Namibia, South Africa, Zambia, Zimbabwe. Each electricity company had different ways of collecting, storing, and documenting their data, the solution was to provide different functions for each country tailored to each electricity agencies data collection methods. The results of the application were an increase in efficiency and accuracy in analytics, initial development of the data pipeline, and a contribution of a reusable and reproducible CLI app utilize by the entire CETlab.

Using This App

Data

The data collected by the electricity agencies then used by the CETlab is not accessible to the public. So, a mock dataset was created, filled with random data points, to show functionality of the CLI app. The test input dataset and a version of the output clean data frame is provided to specifically show how the Eswatini demand data (load) was handled and structured in this application. You can find the mock input dataset and output data frame results in the test-data directory.

Usage

To use this CLI app you can download this repository to your local computer, create a virtual environment, and install the dependencies. Below you can find a quick “How-to Guide” to learn how to use, access, and run commands to clean raw datasets using the CETlab CLI app. There are data wrangling commands for all 10 countries and a set of simple test functions to see the basic functionality of this program. To only view the data wrangling functions, you can access this information in the Data Wrangling Functions Jupyter notebook. To view the complete source code for the CETlab CLI app, you can view this code in the src directory. The src directory includes two main scripts the re_cli.py script for command line functionality and the re_func.py script for data wrangling functions.

How to Guide

Installation

Dependencies

Python==3.12.3
code==1.91.1

You can find the following dependencies in the requirements.txt file

DateTime==5.5
et-xmlfile==1.1.0
numpy==2.0.1
openpyxl==3.1.5
pandas==2.2.2
python-dateutil==2.9.0.post0
pytz==2024.1
setuptools==72.2.0
six==1.16.0
tzdata==2024.1
zope.interface==7.0.1

Modules

The virtual environment I created was "cetlab-cli"; and I directly imported the following libraries:

  • numpy
  • pandas
  • os
  • sys
  • argparse
  • datetime

I built two modules

  • re_cli: creates commandline application functionality
  • re_func: holds data wrangling functions

Example

Step 1

Check if your raw data fits the strict format structure for the functions. There are four main things to check

  • the column names
  • data enteries contain a load demand per half hour
  • the number of rows
  • Date format
  • the name for each sheet.

a. Make sure you have four columns with the following column names 'WEEK', 'DAY', 'DATE', 'TIME', 'SYST'.


b. Starting from Jan 1, 20xx to Dec 31, 20xx, there should be an entry for every every half hour for every day of the year in the following format

c. The number of rows should be 17521

d. The 'DATE' column is day/month/year in the format dd/mm/yyyy

e. The sheet name should be a four digit year.

Step 2

Start with the 'help' command to view all command options and for momre details on the functions

% ./re_cli.py -h

Step 3

We can see the command options with a more detailed description, to use the mock dataset we will use the function with the command -sz --eswatini


Step 4

Run command using mock data. We'll use the function by country two letter code name to run the command

Syntax: % ./re_cli.py <file-name> -<county-code>

Step 5

Follow prompt (if provided). For the Eswatini function the prompt will be the enter the year of the Excel file sheet


Step 6

Three prompts will popup if your entry is validated, indicating the data cleaning process has began. And you can locate output dataframe in current working directory under the format <input-file-name>-dataframe.csv


Step 7

View final clean tabular dataframe, ready for further analysis. Each dataframe has the same format for all countries with the following columns: 'hour', 'day', 'month', 'year', 'system_demand[mw]'


License

This project is MIT licensed

First published: 2019; last updated: 2024

About

This program is a command line interface application (CLI app), developed in Python to automate data wrangling transformations by cleaning and structuring energy demand (load) data from electricity agencies in 10 Southern African countries. The CLI app supports the data analytics and data pipeline development for the UC Santa Barbara CETlab

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published