This document provides a comprehensive overview of the Avocado Analysis system, a data processing and visualization framework for agricultural trade data. The system primarily focuses on analyzing import/export patterns for avocados, with additional support for related fruits such as mangoes and limes. It implements an Extract-Transform-Load (ETL) pipeline that processes CSV data from agricultural trade sources, transforms it into a standardized format, stores it in a PostgreSQL database, and generates visualizations to display price and quantity trends.
System Components and Data Flow
Sources: import_avocado_fsda.ipynb 1-496
mango_df.ipynb 1-20
lime.ipynb 1-20
The system implements a standardized ETL pattern across all fruit types. The central function fas_csv_to_pd_df() transforms quarterly-formatted CSV data into a structured time series format suitable for analysis and visualization.
The fas_csv_to_pd_df() function is the primary data transformation mechanism that:
- Accepts a CSV file path as input
- Creates a new DataFrame with standardized columns (Year, Quarter, Value, Qty)
- Processes quarterly data across multiple years (2015-2019)
- Handles comma-separated numeric values
- Returns a cleaned DataFrame ready for analysis
Function structure: fas_csv_to_pd_df(file_name) Input: CSV file containing quarterly trade data Output: Pandas DataFrame with standardized format
The implementation can be found in all three data processing notebooks with slight variations. The core logic:
- Create empty DataFrame with standardized columns
- Generate year/quarter pairs for all time periods
- Extract value and quantity data from specific cells in the source CSV
- Process and clean numeric values (remove commas)
- Normalize column names to lowercase
The system stores processed data in a PostgreSQL database using SQLAlchemy. Each fruit type has its own dedicated table:
Table Name Description Primary Key Data Columns avocada_fsda Avocado trade data (quarter, year) year, quarter, value, qty mango_price Mango price data id id, year, quarter, date, price lime_fsda Lime trade data with price (quarter, year) year, quarter, qty, price Database connection pattern used across notebooks:
rds_connection_string = USERNAME+':'+PASSWORD+"@localhost:5432/[database_name]" engine = create_engine(f'postgresql://{rds_connection_string}') conn = engine.engine.connect()
The system processes CSV files with a specific format containing quarterly import/export data:
Partner, HS Code, Product, Year, UOM, Q1(Jan-Mar) Value, Q1(Jan-Mar) Qty, Q2(Apr-Jun) Value, Q2(Apr-Jun) Qty, Q3(Jul-Sep) Value, Q3(Jul-Sep) Qty, Q4(Oct-Dec) Value, Q4(Oct-Dec) Qty, Total Value, Total Qty
After processing, data is restructured into a time series format:
year, quarter, value, qty, [price (optional)] The price column is calculated as value/qty in some notebooks (e.g., lime.ipynb).
All three notebooks follow a similar pattern:
- Import necessary libraries (pandas, sqlalchemy, matplotlib)
- Read CSV data files from the Resources directory
- Transform data using the fas_csv_to_pd_df() function
- Convert data types and perform calculations
- Connect to PostgreSQL using credentials from sql_login.py
- Create or update database tables
- Load transformed data to the database
- Generate visualizations (primarily in the avocado notebook)
This consistent pattern allows for standardized processing across different fruit types while accommodating minor variations specific to each data source.
The system works with CSV data from agricultural trade sources, focusing on imports/exports of avocados, mangoes, and limes. The data provides quarterly values and quantities from 2015-2019, primarily tracking imports from Mexico to the United States. Data sets used include an avocado price data set from Kaggle (CSV), Yahoo Finance API for Chipotle stock price, and United States Department of Agriculture Foreign Agricultural Service data (CSV)
Each CSV file contains:
- Product identification (HS Code)
- Trade partner information
- Quarterly values in thousands of dollars
- Quantities in metric tons (MT)
- Reporter and partner country codes
- Did increasing avocado prices cause avocado imports to increase and imports of other crops produced in Michoacan to decrease (Kaggle and USDA)?
- As prices went up how did domestic and import amount change (USDA)?
- Does the time of year affect the price of avocados?
- Does the price of avocados affect the stock price of Chipotle?
- cmg (Chipotle Stock Price)
- avo_price (Avocado Price Data from Kaggle)
- avo_qtr_price (Average Quarterly Price based on Kaggle Data)
- avocada_fsda (Avocado Import Data)
- corn_fsda (Corn Import Data)
- mango_fsda (Mango Import Data)
- lime_fsda (Lime Import Data)
The Avocado Analysis system provides a complete ETL pipeline for processing agricultural trade data. It employs a standardized approach to transform quarterly CSV data into a time series format, store it in PostgreSQL, and generate visualizations for analysis. The system is modular with similar patterns implemented across different fruit types, primarily focusing on avocados with extensions for mangoes and limes.
- We did not see any obvious change to the production of other crops as avocados ramped up.
- Yet to be determined import/domestic production relation.
- There was a dip in avocado prices near the beginning of the year for four of the five years considered.
- There was a significant drop to Chipotle stock in one case where avocado prices shot up.
- FSDA data was in a two way table. Used iloc to change to one way (See Below).
- Yahoo finance API gave daily stock close values, but the Kaggle data was weekly (Sunday). The last day of the weeks close for Chipotle was used for price on Sunday.
- All data was imported into SQL tables and querried later to make plots.
- Various conversions to date format and numeric format after removing comma-like charecter from numbers (used to compute unit prices).
FIELD1 | Partner | FIELD3 | HS Code | Product | Year | UOM | Q1(Jan-Mar) Value | Q1(Jan-Mar) Qty | Q2(Apr-Jun) Value | Q2(Apr-Jun) Qty | Q3(Jul-Sep) Value | Q3(Jul-Sep)Qty | Q4(Oct-Dec) Value | Q4(Oct-Dec) Qty | Total Value | Total Qty | Reporter Code | Partner Code | Product Code |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Mexico | 1 | 805503000 | LIM,TAH/PR,FR/DR | 2015-2015 | MT | 82,086 | 101,270.30 | 70,518 | 118,700.50 | 65,554 | 139,964.90 | 52,877 | 117,020.30 | 271,035 | 476,955.90 | US | MX | 805503000 |
1 | Mexico | 1 | 805503000 | LIM,TAH/PR,FR/DR | 2016-2016 | MT | 70,928 | 113,589.30 | 126,874 | 128,207.70 | 81,396 | 146,864.20 | 69,181 | 130,708.00 | 348,379 | 519,369.10 | US | MX | 805503000 |
1 | Mexico | 1 | 805503000 | LIM,TAH/PR,FR/DR | 2017-2017 | MT | 111,125 | 116,851.50 | 85,032 | 156,135.40 | 87,353 | 162,139.10 | 84,084 | 134,692.20 | 367,594 | 569,818.20 | US | MX | 805503000 |
1 | Mexico | 1 | 805503000 | LIM,TAH/PR,FR/DR | 2018-2018 | MT | 140,423 | 117,758.60 | 127,912 | 151,146.10 | 93,635 | 161,657.10 | 73,466 | 145,703.10 | 435,436 | 576,265.00 | US | MX | 805503000 |
1 | Mexico | 1 | 805503000 | LIM,TAH/PR,FR/DR | 2019-2019 | MT | 113,771 | 137,236.80 | 119,491 | 160,433.30 | 119,449 | 164,323.60 | 100,280 | 145,816.10 | 452,990 | 607,809.90 | US | MX | 805503000 |
FIELD1 | year | quarter | value | qty | price |
---|---|---|---|---|---|
0 | 2015 | 1 | 892426 | 464327.1 | 1.9219769856206972 |
1 | 2015 | 2 | 761529 | 505477.0 | 1.5065551944005364 |
2 | 2015 | 3 | 440088 | 348003.2 | 1.2646090610661052 |
3 | 2015 | 4 | 739714 | 385540.0 | 1.9186439798723869 |
4 | 2016 | 1 | 1046512 | 527981.1 | 1.9821012532456181 |
5 | 2016 | 2 | 903974 | 575985.6 | 1.5694385415190937 |
6 | 2016 | 3 | 536917 | 447233.5 | 1.2005294773311928 |
7 | 2016 | 4 | 891793 | 489631.9 | 1.8213539599850417 |
8 | 2017 | 1 | 1149175 | 583105.6 | 1.9707836796628262 |
- cmg (Chipotle Stock Price)
- avo_price (Avocado Price Data from Kaggle)
- avo_qtr_price (Average Quarterly Price based on Kaggle Data)
- avocada_fsda (Avocado Import Data)
- corn_fsda (Corn Import Data)
- mango_fsda (Mango Import Data)
- lime_fsda (Lime Import Data)
All imports are made to create and update SQL tables
File | Purpose |
---|---|
import_avocado_prices | Import data from Kaggle CSV file |
chipotle.ipynb | Import data from yahoo finance |
cmg_analysis_and_plot.ipynb | analyse stock prices and avocado prices |
import_fas_data | import avocado import data from fsda |
lime.ipynb | import lime data from fsda |
mango_df.ipynb | import mango data from fsda |
analysis_fsda.ipynb | analyze data from fsda |