R&D
Aug 18, 2025
10
MIN READ
BY
Lanang Afkaar and Toru Mitsutake
Tech Blog: Super Resolution Release
We’re excited to release the 5× super resolution model developed during our recent super resolution competition. You can now train your model using our public codebase or deploy the first-place winning solution directly to enhance your remote sensing images.
Share
Introduction
In satellite or aerial remote sensing, clarity is everything. A single pixel might represent a house, a tree, or a vehicle—and in many cases, critical decisions hinge on interpreting those pixels correctly. Yet, high-resolution satellite imagery is often limited by cost, orbital constraints, or atmospheric conditions. This is where super resolution comes into play.
Super resolution is a deep learning technique that enhances the resolution of an image by reconstructing finer details from lower-resolution input. Rather than simply sharpening or resizing, it learns patterns from data to generate plausible high-resolution outputs. For satellite imagery, this means you can improve visual detail and object recognition without launching new satellites or paying for expensive commercial data.
In this blog, we’re excited to introduce our publicly released 5× super resolution model—a product of our recent competition aimed at pushing the limits of satellite image enhancement. With 5× scaling, you can significantly sharpen your images, uncover hidden structures, and boost the performance of downstream tasks like land cover classification or disaster assessment.
Whether you’re working in environmental monitoring, agriculture, defense, or urban planning, the ability to extract more from your imagery is a major advantage. And now, thanks to our open-source release, that power is in your hands.
About the competition
This competition set out with a clear mission: to develop a super-resolution model capable of enhancing satellite images by a factor of 5×, and to make that model publicly available as open-source software (OSS). The core goal was to advance the practical use of satellite data by lowering barriers to high-resolution imagery through cutting-edge machine learning.
One of the persistent challenges in satellite image analysis is limited resolution. While satellites can capture vast geographic areas, the trade-off often comes in the form of low image clarity. Even the most advanced commercial satellites today offer imagery at around 30 cm resolution, but such data is both costly and frequently bound by strict licensing and usage limitations.
By developing a high-quality super-resolution model, this competition aimed to simulate the benefits of high-resolution imagery—without the associated costs or access restrictions. Instead of relying solely on proprietary data, users can now enhance publicly available low-res satellite and aerial images, opening up possibilities for more affordable and scalable applications in environmental monitoring, disaster response, agriculture, urban planning, and beyond.
Beyond technical achievement, the broader vision is to accelerate the social implementation of geospatial data, including both satellite and aerial imagery. By releasing this model openly, we hope to empower researchers, developers, and organizations worldwide to unlock more value from Earth observation data.
The Data we used
The dataset we used is from the Nerima dataset. The Nerima dataset is a 10cm resolution aerial photography dataset from Nerima ward, Tokyo. We sliced and modified the image to emulate high-resolution (10cm) and low-resolution (50cm) images with each respective set. The dataset itself is a CC-BY licence. Take a look more about the dataset here.
🚀 What we're releasing
🥇 Winning 5x Super Resolution Model
We’re releasing the first-place solution from the 5× super-resolution competition hosted by Solafune. The winning team, Team N, delivered an outstanding model that significantly enhanced low-resolution satellite imagery to near 10 cm quality.
Team N members:
Their approach leveraged the Swin2SR architecture—a transformer-based model designed for image super resolution. To boost performance and realism, they incorporated additional training datasets from external sources, bridging the gap between synthetic high-resolution imagery and real-world 10 cm imagery.
🛠️ Open-Source Codebase
We’ve open-sourced the entire pipeline—including preprocessing, training, and inference scripts. Whether you’re a researcher, developer, or satellite data enthusiast, you can now build on top of this model, adapt it to your datasets, or even contribute improvements back to the community.
📦 Pre-Trained Weights
To make it even easier to get started, we’ve also provided pre-trained model weights. These can be used immediately for inference without the need for retraining, enabling faster integration into your existing workflows or applications.
✅ Model Highlights and Performance
🧠 Architecture Overview
SwinIR (Swin Transformer for Image Restoration) is a Transformer‑based model that significantly elevates state-of-the-art performance in tasks like image super-resolution, denoising, and JPEG compression artifact reduction. Swin2SR extends this framework by replacing SwinIR’s Swin Transformer blocks with Swin Transformer V2 layers. These layers use local window-based self-attention with shifted windows to efficiently model both local features and cross-window dependencies.
Swin Transformer V2 introduces key improvements, including:
Residual post-normalization for better gradient flow,
Scaled cosine attention for stabilized attention scores, and
Log-spaced continuous relative position bias to better generalize across scales.
Together, these enhancements improve training stability and efficiency, enabling faster convergence—up to 33% fewer iterations compared to SwinIR—while maintaining or surpassing performance.

Fig.1: Architecture’s Diagram for SWIN2SR, Notice the Usage of SWIN Version 2 Applied to the Model Architecture. (Taken from the original paper)
The overall architecture resembles SwinIR: a shallow convolutional layer for feature extraction, followed by several Residual Swin Transformer Blocks (RSTBs) built from SwinV2 layers. Feature maps are fused and upsampled using modules like PixelShuffle. Swin2SR also supports multi-scale super-resolution through a unified upsampling branch.
Swin2SR is highly versatile and ranks among the top performers in image restoration tasks, including classical SR, lightweight SR, and compressed input restoration—earning top spots in the AIM 2022 Challenge.
To further boost performance, Team N trained four models using 4-fold cross-validation and built an ensemble by averaging their outputs during inference.
📊 Benchmark Results
To evaluate performance, we used the Structural Similarity Index (SSIM) as the primary metric to select the best 5× super-resolution model. The results below show that Team N’s solution achieved the highest SSIM, outperforming other participants by a narrow but consistent margin.
The second-place team, KagoAI, also used an ensemble method, but based on the RCAN (Residual Channel Attention Network) architecture. Team N’s superior performance can be attributed to a larger and more diverse training dataset, as well as effective use of the Swin2SR architecture.
🏅 Position | 🧑💻 Team Name | 🏗️ Architecture Used | 📊 SSIM Score |
---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
⚙️ Inference Runtime
Below is a comparison of inference times using both CPU and GPU on different image sizes. Each test was conducted both with cold-start weights (first run) and warmed-up weights (cached in memory).
Devices | Warming up first Model’s Weight | Warmed up Model’s Weight | ||
---|---|---|---|---|
Normal Input (130x130x3) | Large Input (1024x1024x3) | Normal Input (130x130x3) | Large Input (1024x1024x3) | |
CPU (M3 Max) | 10.71 seconds | 505.90 seconds | 6.08 seconds | 489.89 seconds |
GPU (H200) | 7.66 seconds | 17.77 seconds | 0.27 seconds | 10.27 seconds |
The GPU-accelerated inference demonstrates substantial speed improvements, particularly for real-time or large-scale batch processing scenarios.
🖼️ Qualitative improvements (before/after visuals)
Team N’s model successfully super-resolves 5× low-resolution inputs—e.g., from 130×130×3 to 650×650×3—producing visually sharper and more detailed outputs. On the provided evaluation set, the model achieved an average SSIM of 0.7835.
Visual comparisons show significant improvements in edge clarity, texture reconstruction, and fine detail preservation—especially on buildings, roads, and vegetation.

🧭 Getting Started with the Codebase
📁 Repository Structure
The released 5× super-resolution model is part of the Solafune-Tools project—an open-source toolkit for geospatial analysis. Within this repository, the super-resolution module can be found under the osm namespace.
Below is the relevant directory structure:
⚙️ Installation and setup
The minimum Python version to install the solafune-tools and super-resolution is 3.10, and we support Python until 3.13 when this blog was written. The version of the solafune-tools is 0.9.2. We suggest that if you want to run the super-resolution smoothly using NVIDIA GPUs, at least use the following recommended hardware and software.
🚀 Recommended Hardware
For inference:
CUDA ≥ 11.8
VRAM ≥ 2GB
For training:
CUDA ≥ 11.8
VRAM ≥ 12GB
Below is how to install the repository straightforwardly.
🧑🏫 Example usage
📈 Training Usage
To train this model based on the current settings and the current dataset. It's pretty straightforward. By following this training command.
You can use your dataset or the competition dataset There are extras dataset that is part of the code model training.
When you want to use the competition dataset, you don't need to download it first. The program inside the dataset module will automatically download from the S3 bucket storage and put them structurally comply with how the training will commence. To train the model, you only need to use this bash script:
If you want to test whether the training will run or not, you can use
-debug
flag to find out if something is wrong with your environment, without taking the time to wait for all epochs
If you want to use multi-GPU and Distributed Data-Parallel(DDP), you can use the argument flag
-use_gpus 0,1,2,3... or 1/2/4
and
-strategy ddp
to choose how many GPUs you want to use and the use of the DDP strategy in training. The default is using one GPU and not using DDP.
If you want to use your dataset, please comply with the dataset structure tree. Failure to comply will result in training being abortedAfter you comply with the dataset structure, you can run the script below to set the True option using your datasetBecause there are about 4 steps in training the model, it will take around 10-15 days to train all the models, depending on how many main datasets or your type of GPUs.
If you want to continue your last training after you have done your first model training, you can use this flag
--continue-training
followed by your models' directory, where you keep the trained model checkpoint. Please follow this naming format for your mid-training, like this folder tree. Also, follow the four-fold arrangement starting from number zero to number three.
Here is the following CLI example for continuing mid-training.
Full example to test if all functions are working as intended
🔍 Inference Usage
To run the inference using Python solafune_tools
library import, you can use this usage tutorial to help you first time using this 5x Super Resolution model inference. Please prepare the first image you want for the inference first. JPG/JPEG, PNG, TIF, and TIFF are the acceptable file extensions, but basically, as long as the format is numpyArray also acceptable. This example lines of code will let you run the model inference.
🔍 Inference (Python Interface)
💻 Bash/CMD panel Interface
To run the inference through the bash/cmd panel interface, please prepare the first image you want the inference on first. JPG/JPEG, PNG, TIF, and TIFF are the acceptable file extensions. This bash example will let you run the model inference.
🌍 Example Real-World Use Cases
🌱 Agriculture: Crop Health Monitoring at Parcel Level
Super-resolved imagery enables finer-grained observation of crop variation within individual fields, particularly when working with open satellite data like Sentinel-2, which originally has a spatial resolution of 10 meters. Applying 5x super-resolution transforms this into ~2 meters or better, allowing clearer delineation of disease spread, irrigation issues, or nutrient deficiencies at the plant or row level.
🌪️ Disaster Monitoring: Damage Assessment After Earthquakes
In the aftermath of major disasters, super-resolved images can help responders detect building collapses, road blockage, and landslide traces, especially when high-res commercial imagery isn’t available immediately. Super-resolution accelerates insights from available free imagery like Sentinel or Landsat.
🗺️ Mapping: Updating Urban Footprints in Rapidly Growing Cities
Super-resolution enhances low-res historical or multi-temporal imagery, helping detect recent urban sprawl, new infrastructure, or informal settlements. It supports updating OpenStreetMap and urban land-use databases more accurately, even in fast-changing regions with limited access to high-res commercial sources.
Future Directions and Contributions
🤝 How the Community Can Contribute
We believe that the advancement of satellite super-resolution depends on an open, collaborative ecosystem. We invite the community to:
Train and share your own models: Experiment with new architectures or training strategies and contribute your models back to the Solafune-Tools ecosystem.
Report bugs or suggest enhancements: Your feedback helps us refine both the tools and the documentation.
Submit pull requests: Whether it’s improving inference speed, supporting new data formats, or extending features, contributions of all sizes are welcome.
Together, we can push the boundaries of satellite imagery analysis and make high-resolution Earth observation more accessible.
🚀 Planned Improvements
We have several exciting updates in the roadmap to make the model more capable and versatile:
Multispectral support: Current models are limited to RGB inputs. In the future, we aim to support multispectral training — including NIR and other spectral bands — to expand use cases in environmental monitoring, agriculture, and more.
8x Super-Resolution: While 5x is a major leap, it still falls short in some scenarios. We plan to explore 8x super-resolution to further enhance image clarity and bring even more value to downstream applications.
On-device optimization: We aim to support lightweight, optimized versions of the model that can run on edge devices or in limited-compute environments.
If you’re interested in contributing or collaborating on these efforts, feel free to reach out or submit a GitHub issue or pull request.
🔚 Conclusion: Try It, Share It, Shape It
With the release of our 5× super-resolution model, you now have access to powerful tools that can significantly enhance your remote sensing workflows—whether you’re analyzing agricultural fields, assessing post-disaster damage, or mapping rapid urban development.
Here’s what’s available today:
✅ A first-place winning model using the advanced Swin2SR architecture
📦 Pre-trained weights for immediate deployment
🛠️ A fully open-source codebase for training, inference, and experimentation
📊 Benchmarked performance with high SSIM scores and efficient runtime
🔄 Support for both CPU and GPU environments
🌍 Real-world use cases across agriculture, disaster response, urban planning, etc.
But this is just the beginning. We’re counting on the community to help push the boundaries further:
👉 Try the model with your own satellite or aerial imagery
👉 Train your own version and explore different datasets or architectures
👉 Contribute back by submitting PRs, reporting issues, or sharing improvements
👉 Give feedback on performance, usability, and real-world results—we’re listening!
Start enhancing your imagery today and help us build the next generation of open geospatial tools.
Check out the repo: Solafune-Tools on GitHub
Let’s super-resolve the Earth—together.
Appendix
🙏 Acknowledgment
This project would not have been possible without the collaborative spirit and dedication of the broader community.
We extend our heartfelt thanks to:
Team N, the first-place winners of the 5x Super-Resolution competition, whose outstanding model serves as the foundation of this open-source release.
All competition participants, whose diverse approaches and contributions helped push the boundaries of satellite super-resolution research.
The Solafune platform and community, for hosting and supporting innovation in the geospatial AI space.
📝 License
The 5x Super-Resolution model is released under the same license as Solafune-Tools: the MIT License.
This permissive license allows users to:
Use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the software.
Include it in proprietary projects or commercial applications.
However, the following conditions must be met:
You must include the original MIT License in any distributed copies or substantial portions of the software.
You must retain the original copyright notice.
For full legal details, please refer to the LICENSE file in the repository.
Read others
Precision in Motion: Unveiling Subtle Ground Shifts with InSAR’s Millimeter Accuracy
This study tracks ground shifts at Sakurajima volcano using Sentinel-1 SAR and SBAS with millimeter-level precision.
R&D
Apr 8, 2025
12
MIN READ
Building Country-Scale Basemaps with Sentinel-2 Imagery: A Memory-Efficient Approach
See through the clouds: a practical guide for creating unobstructed basemaps.
R&D
Feb 3, 2025
6
MIN READ