-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base_providers.py/_get_intensity_benchmarks is badly factored #152
Comments
If I understand correctly, we still want to calculate temperature score for all geographical regions, reported by a company, - right? If so, what we might want is DataFrames with not only company_id as index, but [company_id, region] as an index. I am new to pandas, but looks like they have a MultiIndex class for such use-cases. Please correct me if it doesn't fit our goal. In my "scopes" work, I am investigating ways to calculate scores per company per scope, and experimenting with MultiIndex in this context. If you find this option promising, we could turn our DataFrames into MultiIndex [company_id, region, scope], to support both work packages. What do you think? |
Kirill, your understanding of MultiIndex is correct, and indeed I was thinking along similar lines. Indeed, benchmarks are a pd.Series that use sector and region as a MultiIndex (the
Note also that the comment retains a vestigial connection to the concept of a |
xref #162 |
This issue is obsolete. We now keep lots of data in company/scope columns, which preserves their Pint-like nature. |
As part of #146 and reviewing @kmarinushkin's PR (initial_load_all_scopes) I traced what would happen when benchmark data for Autos was properly classified as S3 rather than S1S2. While there are a few pitfalls along the way, the big problem are these functions in base_providers.py:
Within these methods,
self._EI_benchmarks
does contain the full complexity of S1S2, S3, and S1S2S3 benchmark data. However, the per-companycompany_sector_region_info
dataframe has heterogeneous corp data, with Electricity Utilities and Steel having meaningful S1 and S2 benchmark data (but no S3 benchmark data) and Autos having mainly (by a factor of more than 100x) S3 benchmark data. Because it's so small, S1 and S2 data for autos wouldn't be missed.From this perspective, we can see that it's not really right to ask for a single scope-based benchmarks to be matched against a list of companies when sector is so determinant of what scope data may even be available. One way to solve this is to call into these functions with only
company_sector_region_info
companies that are relevant for the given scope.Any thoughts on where might be the right place to address these problems? And whether Kirill's changes to handle scopes actually points the way to a better factoring and implementation?
The text was updated successfully, but these errors were encountered: