The Global Historical Climatology Network - Daily (GHCN-Daily) dataset integrates daily climate observations from approximately 30 different data sources.
Download by_station data from here. gunzip using find -type f | parallel gunzip.
wget https://www.ncei.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt
awk '{print "https://www.ncei.noaa.gov/pub/data/ghcn/daily/by_station/"$1".csv.gz"}' ghcnd-stations.txt | parallel -j5 wget -nc -q -P by_station/ {}Info:
ID = 11 character station identification code
YEAR/MONTH/DAY = 8 character date in YYYYMMDD format (e.g. 19860529 = May 29, 1986)
ELEMENT = 4 character indicator of element type 
DATA VALUE = 5 character data value for ELEMENT 
M-FLAG = 1 character Measurement Flag 
Q-FLAG = 1 character Quality Flag 
S-FLAG = 1 character Source Flag 
OBS-TIME = 4-character time of observation in hour-minute format (i.e. 0700 =7:00 am)
- This is not going to work in the full dataset... See pr.subspace(T=cf.dt('1916-06-03'))vspython field.py.
In the full dataset:
dimensions:
	name_strlen = 11 ;
	timeseries = UNLIMITED ; // (125987 currently)
	obs = UNLIMITED ; // (1122439562 currently)
$ time python field.py 
Processing axis time.
Processing axis lat.
Processing axis lon.
Processing axis alt.
Processing axis station.
[pr: Station: 125987 Lat: 125987, Lon: 125987, Time: 100183 (1750-02-01 00:00:00, 2024-05-17 00:00:00)
]
Subspacing 1916-06-03 00:00:00-2018-06-08 00:00:00 at lat=-29.0.
Done ((23, 37261)).
Subspacing 1916-06-03 00:00:00-2018-06-08 00:00:00 at lat=slice(-29.0, -20.0, None).
Done ((7549, 37261)).
Subspacing 1916-06-03 00:00:00-2018-06-08 00:00:00 at lat=slice(-29.0, -12.0, None),lon=slice(117.0, 170.0, None).
Done ((5367, 37261)).
Subspacing stations ['ASN00001028', 'ASN00003002', 'ASN00006049', 'ASN00007064', 'ASN00004064'].
Done ((5, 100183)).
Subspacing stations ['ASN00001028', 'ASN00003002', 'ASN00006049', 'ASN00007064', 'ASN00004064'] at 1916-06-03 00:00:00-2018-06-08 00:00:00.
Done ((5, 37261)).
real	1m8,879s
user	1m6,967s
sys	0m2,427s