Javidan Karimli, Ramil Mammadov, Jennifer Li, Shiyue Zhou
This application provides real-time weather details and interesting information for locations across the United States. It leverages multiple components and a robust cloud infrastructure to ensure reliable and efficient performance. The project integrates key data engineering principles, employs Infrastructure as Code (IaC) for automated resource management, and implements a CI/CD pipeline to streamline development and deployment processes.
Several components work together seamlessly to ensure the full functionality of the weather application.
The SQLite database is used to store all the essential data required by the weather application, including city locations and detailed information about them. This database was chosen for its lightweight nature and fast performance, making it ideal for handling localized application data efficiently. It plays a critical role in providing a structured and reliable storage solution.Here is the table schema structure for the weather application.
The Database API serves as a bridge between the application and the SQLite database, encapsulating all interactions to ensure efficiency and consistency. It simplifies tasks such as spinning up a new database from scratch, retrieving, and uploading CSV files into the database. Additionally, the API handles exception management, logging, and database optimization to ensure smooth operations and reliability.
The Weather API is responsible for retrieving real-time weather data for the specified cities. It manages all interactions with the OpenWeather API, including handling the API key securely and managing error handling. Beyond fetching current weather data, it also supports future weather forecasting, making it a versatile and integral component of the application.
Streamlit serves as the front-end service, facilitating all user interactions with the weather application. The web interface is divided into two main sections: a map and a dashboard, offering users an intuitive and interactive experience. The map functionality is powered by Folium, with seamless integration achieved through the streamlit-folium library, enabling dynamic and interactive geospatial visualizations directly within the application. Performance considerations, such as optimizing response times and ensuring smooth interactions, are carefully implemented to enhance usability and provide an optimal user experience.
Pandas is utilized for processing all necessary data, such as city locations and related information, to prepare it for use in the SQLite database. It enables efficient transformation of raw data into a structured format that aligns with the database model, ensuring seamless integration and high-quality data preparation.
To ensure a robust, scalable, and environment-agnostic deployment, the weather application is containerized using Docker and hosted on an AWS EC2 instance for public accessibility. The AWS Elastic Container Registry (ECR) is integrated to manage Docker images and keep them up to date. Deployment and updates are fully automated through a CI/CD pipeline powered by GitHub Actions, ensuring streamlined and reliable cloud-based operations.
The application utilizes several libraries and tools to deliver a seamless user experience:
- Streamlit: For building the interactive front-end interface.
- Streamlit-Folium: For integrating and displaying dynamic maps using Folium within the Streamlit application.
All required libraries are listed in the requirements.txt file. To test the application locally, users must create an access key on the OpenWeather website, then store it in a .env file with the following key name:
WEATHER_API_ACCESS_TOKEN=<Your_Access_Key>
To ensure you have all the necessary libraries, run the following commands in your terminal or command prompt:
make installSince the database is included with the application, there is a possibility of data loss during file movement or deployment. To ensure proper functionality, follow these steps:
data/db/application.db
python deployment.pyEnsure you have created a .env file in the root directory with the appropriate OpenWeather API access key to enable weather data retrieval as discussed above.
- Open a terminal or command prompt in the directory where your Python file (main.py) is saved.
- Run the following command:
streamlit run main.pyOpen the Local URL (e.g., http://localhost:8501) in your web browser to view the dashboard.
The application is fully deployed using Docker, with two distinct Dockerfiles included in the project:
-
Development Environment (
.devcontainer):
The Dockerfile located under.devcontaineris designed to provide a reproducible development environment inside GitHub Codespaces. This allows users to pull the repository and use it seamlessly within Codespaces.- Recommendation: Activate the "docker-in-docker" option in Codespaces to ensure proper functionality, as the application itself runs in a separate Docker container.
-
Application Deployment (
App.Dockerfile):
TheApp.Dockerfile, located in the root directory, is responsible for containerizing the application for production deployment. This Dockerfile ensures the application is optimized and packaged in an efficient and portable manner.
By leveraging these Dockerfiles, the application offers flexibility for both development and deployment while maintaining consistency and efficiency.
- Update the system and install Docker:
sudo dnf update -y sudo dnf install docker -y
- Start and enable the Docker service:
sudo systemctl start docker sudo systemctl enable docker - Add the EC2 user to the Docker group for permission management
sudo usermod -a -G docker ec2-user
- Create the
.awsdirectory and set up credentials and configuration files:mkdir -p ~/.aws echo "[default]" > ~/.aws/credentials echo "aws_access_key_id=${AWS_ACCESS_KEY_ID}" >> ~/.aws/credentials echo "aws_secret_access_key=${AWS_SECRET_ACCESS_KEY}" >> ~/.aws/credentials echo "[default]" > ~/.aws/config echo "region=${AWS_REGION}" >> ~/.aws/config
- Start and enable the Docker service:
- AWS_ACCESS_KEY_ID: Your AWS access key ID.
- AWS_SECRET_ACCESS_KEY: Your AWS secret access key.
- AWS_REGION: The AWS region hosting your resources.
- Authenticate Docker with ECR using AWS CLI:
aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS --password-stdin ${ECR_REGISTRY}
- Remove unused Docker images to free space:
docker image prune -f
- Pull the application Docker image from ECR:
docker pull ${ECR_REGISTRY}/${ECR_REPOSITORY}:${IMAGE_TAG}
- Stop and remove any existing containers to ensure a clean start:
sudo docker stop weather_app || true sudo docker rm weather_app || true
- Run the application container:
sudo docker run -d -p 9999:9999 --name weather_app \-e WEATHER_API_ACCESS_TOKEN=${WEATHER_API_ACCESS_TOKEN} \${ECR_REGISTRY}/${ECR_REPOSITORY}:${IMAGE_TAG}
• Load_test.py: we created this script to conduct load testing to evaluate our weather application's performance.
• We used Locust to perform the load testing, to verify the microservice's performance. The testing result shows that when scaling to 10,000 concurrent users, our microservice demonstrates strong system reliability and stability, with the service maintaining a consistent 0% failure rate throughout the testing duration while handling approximately 1,800-2,000 requests per second (RPS). The reason why the actual RPS is below our 10,000 RPS target is because OpenWeather API sets a free tier limit of 2,000 calls per day, resulting this bottleneck for the performance. However, as evidenced by the steady RPS graph maintaining around 1,800-2,000 requests per second once reaching peak load and a consistent response time pattern, our system exhibits excellent stability.
The application currently faces several limitations. It is constrained by the rate limits of the WeatherAPI, which restrict the frequency of data retrieval. Additionally, the lack of caching mechanisms leads to redundant API calls and slower performance, particularly for map objects and refreshing content. Streamlit's current capabilities also pose challenges in handling highly interactive and dynamic content across the website, which can impact user experience. For future improvements, allowing users to input their own WeatherAPI access keys could help bypass rate limits and provide more flexibility. Implementing caching methodologies would significantly reduce the number of API calls and improve overall efficiency. Furthermore, optimizing Streamlit's handling of interactive elements would enhance the user experience, making the application more robust and user-friendly.
We write our code in Visual Studio Code and leverage AI tools for debugging assistance and code suggestions. These tools have proven to be invaluable in streamlining and enhancing our development process.





