IndianPine Site3 dataset taken by AVIRIS, an aerial imaging camera equipped with a high-resolution sensor, can be read by scikit-learn.
pip install git+https://github.com/helmenov/IndianPines
Readme.md
: This document selfsetup.py
: Script required for installing the package by pip__init__.py
: constructor for the provided modulesdataset.py
: Main module includes two functionload
andmake_dataset
example.py
: sample coderesource/
IndianPines.rst
: Description for datasetLabelsFromTIF.rst
: Description for Original Ground TruthLabelsFromMAT.rst
: Description for GIC Ground TruthReduceWaterAbsorption.rst
: Description for Reducing Water Absorption Channelsrecategorize_rules/
recategorize17to10.csv
: recategorize_rule example. which recategorize original 17 categories to 10 categories.
import IndianPines as pines
see example
load(
pca = 0,
include_background = True,
recategorize_rule = None,
exclude_WaterAbsorptionChannels = True,
)
-
pca
: Parameter for PCA (default:0
)-
pca
== 0: No analized - 0 <
pca
< 1 (float): contribution ratio -
pca
>= 1 (int): number of dimensions after compression -
pca
< 0 : No compression but analyzed PCs - it effects the bunch shape (if >0, (
$145 * 145$ ,$220$ )->($145 * 145$ , #_of_features))
-
-
include_background
: whether the background (un-registered) samples are included in the Bunch or not. (default:True
)- it effects the bunch shape (if False, (
$145 * 145$ ,$220$ )->(#_of_samples,$220$ ))
- it effects the bunch shape (if False, (
-
recategorize_rule
: (default: None)- original data is categorized 16-category and background. But some categories are very few counted. Therefore you can reserve Transform Table as following format, and set it in this option.
if you recategorize in following new categories
BackGround Alfalfa Corn Grass Hay-windrowed Soybeans Wheat Woods Bldg-Grass-Tree-Drives Stone-steel towers #ffffff #ff0088 #0000ff #007f7f #a2b5cd #da70d6 #ffa500 #884428 #00ff00 #ffff00 in following transform rule
original 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 recategorized 0 1 2 2 2 3 3 0 4 0 5 5 5 6 7 8 9 , you should write following CSV file.
BackGround,Alfalfa,Corn,Grass,Hay-windrowed,Soybeans,Wheat,Woods,Bldg-Grass-Tree-Drives,Stone-steel towers #ffffff,#ff0088,#0000ff,#007f7f,#a2b5cd,#da70d6,#ffa500,#884428,#00ff00,#ffff00 0,1,2,2,2,3,3,0,4,0,5,5,5,6,7,8,9
-
exclude_WaterAbsorptionChannels
: (default: True)- some original channels are effected by water absorption. Gualtieri and Cromp [SC99] have suggested to reduce [104-108], [150-163], and 220 -th channel (20-channels) for Analysis.
- [SC99] J.A. Gualtieri, R.F. Cromp, ``Support vector machines for hyperspectral remote sensing classification,'' 27th AIPR workshop: Advances in Computer-assisted Recognition, vol. 3584, pp. 221-232. SPIE, 1999.
-
gt_gic
: (default: True)- ground truth labels from GIC's MAT-file
- if False, from original bundle's TIF-file
- Bunch(compatiple with scikit-learn)
-
.features : (
$145 * 145$ , #-features) numpy arrays in float raw - .feature_names:columns name of each features
-
.target :(
$145 * 145$ , ) numpy array in integer (indexed in 'target_names') - .target_names:(#-categories, ) numpy array in strings. category names corresponding with indexes
-
.coordinates :(
$145 * 145$ , 2) numpy arrays in integer. x-y coordinates: column-numbers and row-numbers, for each samples - .coordinate_names:(2,) numpy array in strings. cordinates name: ['#columns', '#rows']
- .hex_names:(#-categories,) numpy array in strings. hex-color format (ex, '#0000' for black) for each category indexed in 'target_names'
- .DESCR:description for dataset
- .filename:CSV file name saved features and targets
-
.features : (
features
, target
and coordinates
is ordered for row in range(image height) for column in range(image width)
Input and Output are None.
This function called from IndianPines.load if you haven't download the original bundles from Pardue University yet. (to guard redundant downloads)
It also make CSV-format dataset.