
🔭 This project aims to detect exoplanets by analyzing stellar brightness variations.
When an exoplanet transits in front of its host star, it blocks a fraction of the star’s light,
creating periodic dimming patterns. However, real-world observations contain significant noise
from telescope sensors, space environment, and other interference.
✨ Our approach leverages Fourier Transform to filter out high-frequency noise and enhance
periodic signals related to exoplanet transits. We then use XGBoost to classify
whether a star has an exoplanet based on processed light curves.
🪐 When an exoplanet passes in front of its host star, part of the star's light is blocked,
leading to a detectable change in brightness.
As exoplanets orbit their stars, they create periodic brightness variations,
which can be analyzed to determine their presence.
including telescope sensor noise, space environmental interference,
and system errors, making it challenging to accurately detect these periodic changes.
🎯 Our ultimate goal is to analyze brightness variations despite noise interference
and predict the presence of exoplanets.
📡 Source: NASA Kepler Space Telescope
⏳ Description: Brightness measurements taken every 30 minutes
📌 Size: 5,087 samples (37 positive cases, 5,050 negative cases)
📈 We modeled the total flux as follows:
where:
-
$F_{\text{total}}$ is the observed brightness -
$F_{\text{avg}}$ is the average brightness -
$A F_s$ represents planetary transit signals -
$F_n$ is periodic noise
🛠️ We assumed the periodic distribution of noise to develop our model.
🔍 It is difficult and inefficient for a model to learn periodicity directly from the raw, noisy data.
Therefore, we applied the Fourier Transform to convert the original time-domain light curves
into the frequency domain, making periodicity more evident.
📌 Why was 1/9 Hz chosen?
- One of the shortest exoplanetary orbital periods is approximately 4.5 hours
- We defined 9 hours as the minimum orbital period,
extracting only signals corresponding to longer periods of planetary transit
💡 After Fourier Transform Application:
✅ We used Inverse Fourier Transform (IFT) to remove high-frequency noise and reconstruct the signal.
✅ The process eliminated short-term noise fluctuations while preserving the main planetary transit trends.
💡 We selected XGBoost, a Boosting-based ensemble model,
which starts with weak learners and improves performance iteratively by minimizing residuals.
📌 XGBoost Training Process:
1️⃣ Initialize with a base prediction (usually the mean)
2️⃣ Compute residuals (difference between actual and predicted values)
3️⃣ Train new trees to reduce residual errors
4️⃣ Repeat the process to obtain an optimized model
🛠️ Initially, we considered other time-series models,
but we ultimately selected XGBoost due to its high learning efficiency with small datasets
and its robust performance on numerical data.
the model risked being biased toward the majority class.
To address this, we used SMOTE + Tomek Link techniques to balance the dataset.
- SMOTE (Synthetic Minority Over-sampling Technique)
- 📊 Generates synthetic samples for the minority class
- ✅ Achieved ~25% increase in dataset balance
- Tomek Link
- 🚀 Removes unnecessary overlapping samples at class boundaries to enhance decision-making
🛠️ Astronomical data contains a significant amount of noise and outliers.
To minimize the impact of outliers, we applied Robust Scaling:
✅ Trained the scaler on the training data
✅ Applied the same transformation to test data
✅ Normalized data distribution to improve model stability
- PCA (Principal Component Analysis) was used to reduce feature dimensionality
- The dataset contained numerous flux-related features
- We extracted 10 principal components and selected the top 5 with the highest variance
- 🔍 This preserved key information while filtering out irrelevant noise
📊 Validation Method: Stratified K-Fold Cross Validation (5-Fold)
🎯 Objective: Prioritize recall to ensure that no exoplanet cases are missed
Model Variant | Precision | Recall |
---|---|---|
Pure Data | 50-70% | 30-50% |
Fourier Transform Applied | 50-100% | 40-90% |
Full Pipeline (SMOTE + PCA + Scaling) | 86% | - |
✅ Fourier Transform significantly improved model performance
✅ A correlation exists between low-frequency signals and exoplanet presence
✅ Addressing dataset imbalance is critical for improving machine learning models
🚀 Future Research Directions:
- 🧠 Implement CNN/LSTM-based deep learning models
- 🔍 Apply unsupervised learning for hidden exoplanet discovery
- 🎯 Explore Bayesian optimization for hyperparameter tuning
📂 1. Discovering Exoplanets (Pure Data)
📂 2. Fourier Transform Applied
📂 3. Fourier Transform + Feature Engineering
📂 4. Final Model Implementation
🔗 Kepler Space Telescope Data
🔗 Fourier Transform in Signal Processing
🔗 SMOTE: Synthetic Minority Over-sampling
![]() |
![]() |
Seoyeon Kim | Ganghee Lee |
💡 We welcome feedback and contributions to enhance this research! 🚀✨
If you have any questions, feel free to reach out. 😊