Missing companies in 2020 XBRL data – is there a working rssfeed for Q4? #4480
-
Hi everyone, sorry if this is a basic question — it's my first time working with XBRL files. I noticed that the XBRL database for 2020 (from the FERC Form 1 filings), available on this page: https://zenodo.org/records/15685585, does not seem to include all the companies present in the original FERC data. To verify this, I downloaded the full 2020 Form 1 dataset directly from the FERC website and filtered only the Q4 filings. I’m trying to process these XBRL files into SQLite format using ferc-xbrl-extractor, but it requires a rssfeed file. I tried using the rssfeed from 2021, but it didn’t work. From what I can tell, the available rssfeed for 2020 seems to only include the companies listed in the ZIP file from that Zenodo page — not all companies that filed with FERC. So my questions are: Is there a working rssfeed file for the full 2020 Q4 dataset? If not, does that mean it's not possible to process the full 2020 data into SQLite using this method? Just to clarify: I already have the full 2020 Form 1 data in the older (DBF) format, and I’ve worked with it in Excel — my main goal now is to compare the data structures between the XBRL and legacy versions to avoid mistakes during analysis. Thanks in advance, and again, apologies if this is a newbie question! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
Hi @Jeremias74: The DBF data covers 1994-2020. The XBRL data covers 2021-present. IIRC, in 2021 Q1-Q2 they also published DBF data. So if you really need to look at 2 versions of the same data in the two different formats, I would try looking at those quarters, which I think ought to be available in both formats, and in our archives. I think what you're seeing in the 2020 XBRL data is a handful of revisions / re-filings that only happened after the transition to XBRL, and which have only been made available in the XBRL format. So any XBRL data associated with 2020 or earlier is going to be extremely partial, and only contain that handful of records. If you look at the details of the filings, my guess would be that they'll have some difference in the numbers they're reporting. We haven't yet tried to use the XBRL revisions to update older DBF data, since it's a small number of records, and the formats are quite different. We do not convert those last DBF quarters from early 2021 into SQLite, since that time period is covered by the XBRL. FERC also attempted to convert the last 10 years of DBF reporting into XBRL, but it didn't look like a very high fidelity conversion when we dug into it, and we wanted to have the whole time series back to 1994 anyway, so I wouldn't trust that for a decent comparison. Can you say more about what you want to get out of looking at the raw formats and comparing them? They're both pretty annoying if you're trying to do any kind of bulk analysis. We convert all the DBF and XBRL data to SQLite and publish it alongside PUDL. And then we clean up and reconcile a subset of the FERC Form 1 tables across all years, and include them in our data releases. See our data access documentation. If you need any FERC Form 1 tables that we haven't integrated into PUDL, you can download the complete converted dataset as 2 SQLite DBs -- one derived from the DBF, and another from the XBRL (they have pretty different structures). I'd also recommend checking out the PUDL DB to see if the FERC 1 tables you need are there, since it contains all years of data in the same format. You can browse the full PUDL DB through https://viewer.catalyst.coop/ and filter/download subsets of the data. But we haven't yet transferred the complete converted DBF & XBRL derived databases up there. It's high on our list of priorities |
Beta Was this translation helpful? Give feedback.
-
Thanks for the detailed reply, it really helps clarify things. Since around 2015, I’ve been building a custom database to perform benchmarking using different datasets derived from FERC Form 1. My process has been based on extracting data from the legacy DBF files and shaping it into a specific Excel structure with multiple inputs. These inputs rely on the row number, report year, and respondent ID as key identifiers to align and track values consistently across companies and time. Now I’m trying to extend this setup to include data from 2021 to 2024, which means adapting everything to the XBRL-based datasets. The reason I wanted to compare the Excel files I extract from raw DBF and XBRL formats was to minimize errors during adaptation. I can follow the structure and map each field accurately, but seeing that the reported values still match (or are quite similar) helps me quickly validate that I'm doing things correctly. Thanks again for pointing out the limitations of the 2020 XBRL data. I'll try to find the XBRL datasets for Q1 and Q2 of 2021 to use them for comparison. |
Beta Was this translation helpful? Give feedback.
-
Thank you so much |
Beta Was this translation helpful? Give feedback.
If you really need to do a field-by-field comparison with actual values, and you want it to cover all of the respondents, then I think the 2021 Q1-Q2 might be your only option.
Those quarters of data should be available in our FERC 1 XBRL derived SQLite DB in a tabular form that's easier to work with in bulk than the XBRL documents, and it sounds like you're already comfortable extracting the DBF data.
You've probably already come across it but the FERC XBRL Taxonomy Viewer can be very helpful in understanding what the heck is inside the XBRL.