In a previous article, we charted and compared NOAA’s FLs.52j
“final” dataset with its raw
dataset and observed that the former is much steeper than the latter:
So it appears that the raw data is being fudged hotter or cooler based on time. But let’s get a more accurate picture of what exactly is being done.
The approach I took is to calculate the average temperature for a given month among all measuring stations for the FLs.52j
dataset, and then do the same for the raw
dataset, and then subtract corresponding raw
averages from FLs.52j
averages. This might not be the best way, but hopefully it’s at least a step in the right direction.
If you need some fresh data, this should get it:
wget 'https://www1.ncdc.noaa.gov/pub/data/ushcn/v2.5/ushcn.tavg.latest.raw.tar.gz'
wget 'https://www1.ncdc.noaa.gov/pub/data/ushcn/v2.5/ushcn.tavg.latest.FLs.52j.tar.gz'
tar zxvf 'ushcn.tavg.latest.FLs.52j.tar.gz'
tar zxvf 'ushcn.tavg.latest.raw.tar.gz'
Let’s use an older program for computing averages and just tweak it not to perform a moving average on the final result:
FINAL_DIR = "data/ushcn.v2.5.5.20220901" | |
ELEMENT = "tavg" | |
DATASET = "FLs.52j" | |
FINAL_LINE_LENGTH = 124 | |
NUM_MONTHS_IN_YEAR = 12 | |
def get_final_filenames | |
Dir.glob("#{FINAL_DIR}/*.#{DATASET}.#{ELEMENT}") | |
end | |
def chunkify_string(s, size) | |
(0 .. (s.length - 1) / size).map { |i| s[i * size,size] } | |
end | |
class FinalRecord | |
attr_reader :year, :values, :dmflags, :qcflags, :dsflags | |
def initialize(attrs) | |
@year = attrs.fetch(:year) | |
@values = attrs.fetch(:values) | |
@dmflags = attrs.fetch(:dmflags) | |
@qcflags = attrs.fetch(:qcflags) | |
@dsflags = attrs.fetch(:dsflags) | |
end | |
def to_s | |
dmf = @dmflags.map{|e| e.nil? ? "." : e}.join() | |
qcf = @qcflags.map{|e| e.nil? ? "." : e}.join() | |
dsf = @dsflags.map{|e| e.nil? ? "." : e}.join() | |
"f #{@year} #{@values.join(",")} #{dmf} #{qcf} #{dsf}" | |
end | |
end | |
class Final | |
def self.from_file(filename) | |
data = File.read(filename) | |
Final.from_data(data) | |
end | |
def self.from_data(data) | |
id = nil | |
records = [] | |
lines = data.split("\n") | |
lines.each do |line| | |
if line.length != FINAL_LINE_LENGTH | |
raise "line length #{line.length} != #{FINAL_LINE_LENGTH}" | |
end | |
x = line[0,11] | |
if id.nil? | |
id = x | |
else | |
raise "id #{x} != id #{id}" if x != id | |
end | |
year = line[12,4].to_i | |
values = Array.new(NUM_MONTHS_IN_YEAR) | |
dmflags = Array.new(NUM_MONTHS_IN_YEAR) | |
qcflags = Array.new(NUM_MONTHS_IN_YEAR) | |
dsflags = Array.new(NUM_MONTHS_IN_YEAR) | |
chunks = chunkify_string(line[16..], 9) | |
chunks.each_with_index do |chunk, idx| | |
value = chunk[0,6].to_i | |
dmflag = chunk[6] | |
qcflag = chunk[7] | |
dsflag = chunk[8] | |
value = nil if value == -9999 | |
dmflag = nil if dmflag == " " | |
qcflag = nil if qcflag == " " | |
dsflag = nil if dsflag == " " | |
values[idx] = value | |
dmflags[idx] = dmflag | |
qcflags[idx] = qcflag | |
dsflags[idx] = dsflag | |
end | |
records << FinalRecord.new( | |
year:year, values:values, dmflags:dmflags, qcflags:qcflags, dsflags:dsflags | |
) | |
end | |
Final.new(id:id, records:records) | |
end | |
attr_reader :id, :records | |
def initialize(attrs) | |
@id = attrs.fetch(:id) | |
@records = attrs.fetch(:records) | |
end | |
def to_s | |
"Final #{@id}: #{@records.length} records" | |
end | |
end | |
if $0 == __FILE__ | |
db = {} | |
final_filenames = get_final_filenames | |
final_filenames.each do |final_filename| | |
final = Final.from_file(final_filename) | |
final.records.each do |f| | |
next if f.year < 1900 | |
f.values.each_with_index do |v, idx| | |
if v | |
month = idx + 1 | |
db[f.year] ||= {} | |
db[f.year][month] ||= [] | |
db[f.year][month] << v | |
end | |
end | |
end | |
end | |
avgs = [] | |
db.each_pair do |year, months| | |
months.each_pair do |month, values| | |
avg = values.sum(0.0) / values.length | |
avgs << ["#{year}-#{"%02d" % month}", avg / 100] | |
end | |
end | |
avgs.each do |e| | |
puts e.join(",") | |
end | |
end |
Let’s run it to obtain the average temperature per year from the FLs.52j
dataset:
ruby compute-average-nosmoothing.rb > fls.csv
The data looks like this:
...
2021-11,7.031912972085386
2021-12,4.389392446633826
2022-01,-1.0754597701149426
2022-02,1.0216912972085386
2022-03,6.881765188834154
2022-04,10.496707717569786
2022-05,16.95471264367816
2022-06,21.67415435139573
2022-07,24.616535303776683
2022-08,23.754507389162562
Now, modify the program to work on the raw
dataset:
DATASET = "raw"
Run it again, writing the results to another file:
ruby compute-average-nosmoothing.rb > raw.csv
Now, let’s write one more rather boneheaded program to load these two files and subtract corresponding entries:
def moving_average(a, n, precision) | |
a.each_cons(n).map { |e| e.reduce(&:+).fdiv(n).round(precision) } | |
end | |
if $0 == __FILE__ | |
flss = File.read("fls.csv").split("\n").map do |line| | |
a = line.split(",") | |
[a[0], a[1].to_f] | |
end | |
raws = File.read("raw.csv").split("\n").map do |line| | |
a = line.split(",") | |
[a[0], a[1].to_f] | |
end | |
diffs = [] | |
flss.each do |fls| | |
# Find the raw record that has the same `"YYYY-MM"` timestamp, or skip. | |
next unless raw = raws.find {|raw| raw.first == fls.first} | |
# Find the difference between the FLs.52j value and the raw value. | |
diff = fls[1] - raw[1] | |
# Store a record like `["YYYY-MM", 1.23]` | |
diffs << [fls[0], diff] | |
end | |
SMOOTHING_PERIOD = 12 | |
mavg = moving_average(diffs.map(&:last), SMOOTHING_PERIOD, 2) | |
diffs.drop(SMOOTHING_PERIOD-1).map(&:first).zip(mavg).each do |rec| | |
puts rec.join(",") | |
end | |
end |
Finally, run this and save its output:
ruby compute-final-minus-raw-tavg.rb > final-minus-raw-tavg.csv
Here’s that CSV, for those who want to work with it in their own tools.
Loading this into Google Sheets, we see:
Well, well. That doesn’t look good, does it? Seems like thermometer readings from the past ran hot and needed to be cooled, while readings from the present run cool and need to be warmed? Seems like thermometers were only able to give correct readings in 2006? Could that be true? Could someone please check with Neil deGrasse Tyson?
NOAA is claiming that temperatures are going up at the same time they’re adjusting older temperatures down and newer temperatures up. What is the justification for applying this time-dependent fudge factor? What is the justification for applying this particular fudge function? Where is the code at NOAA that implements this? Let’s have a look at it.