Skip to content

Conversation

@savan-chovatiya
Copy link
Contributor

@savan-chovatiya savan-chovatiya commented Jul 26, 2021

Description of change

TDL-13967: Add ProfitAndLoss report stream

Finalized approach:

  • Fetch records for a minimum of 30 days if the bookmark is used from the state in sync.
    • If the bookmark is more than 30 days away then fetch all records up to the current day.
    • If the bookmark is less than 30 days away then fetch records for the last 30 days.
  • Fetch records from start date to current date if the start date is used in sync.
  • Bookmark is required so consider it as an INCREMENTAL stream.
  • Consider ‘ReportDate’ as a primary key and replication key.
  • Only consider Accrual accounting method for the report as of now.

Manual QA steps

  • Verified that record for every day is emitted from the start date to current date in start date scenario.
  • Verified that a minimum 30-day record is emitted if a bookmark from the state is used.

Risks

Rollback steps

  • revert this branch

@luandy64 luandy64 force-pushed the TDL-13967-add-profit-loss-report-stream branch from 8f786ff to b7b19cd Compare August 26, 2021 15:58
@luandy64
Copy link
Contributor

luandy64 force-pushed the TDL-13967-add-profit-loss-report-stream branch from 8f786ff to b7b19cd

Ignore me. I was trying to reason about the report parser and wrote a test, but did not see that one was written in test_report_parser.py

params["end_date"] = end_tm_str

resp = self.client.get(self.endpoint, params=params)
self.parse_report_metadata(resp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having parse_report_metadata() be recursive is excessive. The response object here is just three keys, Header (which we ignore), Rows, and Columns.

Parsing resp['Columns'] seems like a loop over resp['Columns']['Column'] and filtering the 'Metadata'.

Parsing resp['Rows'] is the only recursive part, so we can repurpose most of the logic in parse_report_metadata() into parse_rows() or something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split down parse_report_metadata() into two functions parse_report_columns() and parse_report_rows().

parse_report_columns() will parse resp['Columns'] with a loop over resp['Columns']['Column'] and filters Metadata.

parse_report_rows() is the recursive one which will parse resp['Rows'].

'data': []
}

start_tm_str = strftime(start_dttm)[0:10]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use str(start_dttm.date()) to get date

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated date retrieval as per suggestion.

@savan-chovatiya savan-chovatiya requested review from zachharris1 and removed request for KAllan357 September 23, 2021 10:44
columns = pileOfColumns.get('Column', [])
for column in columns:
metadatas = column.get('MetaData', [])
for md in metadatas:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer instead of using md have more descriptive variable name like metadata

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated md with proper variable name metadata

'''

if isinstance(pileOfRows, list):
for x in pileOfRows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stop using variable names like x. Please provide more descriptive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated x with descriptive name row

self.parse_report_rows(pileOfRows['Summary'])

if 'ColData' in pileOfRows.keys():
d = dict()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stop using variable names like d. Provide more descriptive variable name. Bad coding practice

Copy link
Contributor Author

@savan-chovatiya savan-chovatiya Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will keep in mind using descriptive variable names and updated above d with a proper name.

Comment on lines 322 to 325
for x in pileOfRows['ColData'][1:]:
vals.append(x['value'])
d['values'] = vals
self.parsed_metadata['data'].append(d)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No x, d variable names. Provide descriptive names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated x,d types of variables with proper descriptive names.

reports = self.day_wise_reports()
if reports:
for report in reports:
yield report
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between yield report and yield from report? I've seen in many places using yield from report

Copy link
Contributor Author

@savan-chovatiya savan-chovatiya Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • yield from reports and yield inside for loops, both return generators with every element of reports.

  • The python generator will flush out after iteration over it and here, reports is a generator, and tap use last report from reports for storing bookmark.

  • If we use yield from reports then no way to retrieve the last report after it so we used yield with for loop here to utilize the loop's local variable report to retrieve the last report.

self.state = singer.write_bookmark(self.state, self.stream_name, 'LastUpdatedTime', strptime_to_utc(report.get('ReportDate')).isoformat())

table_name = 'Vendor'
additional_where = "Active IN (true, false)"

class ReportStream(Stream):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more comments to the code to explain at each section what is being done

Copy link
Contributor Author

@savan-chovatiya savan-chovatiya Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more comments to the code for better code explanation.

Comment on lines +309 to +313
if 'Rows' in pileOfRows.keys():
self.parse_report_rows(pileOfRows['Rows'])

if 'Row' in pileOfRows.keys():
self.parse_report_rows(pileOfRows['Row'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between Rows and Row here?

Copy link
Contributor Author

@savan-chovatiya savan-chovatiya Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rows is top-level information about profit and loss reports while Row is information about entries of reports. Documentation of the profit and loss report's metadata: documentation.

Copy link
Contributor

@kspeer825 kspeer825 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a suggestion.


return record_counts

def dt_to_ts(self, dtime):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raising an exception after this for loop would prevent the test from getting back a None and failing in a confusing way in the case where the datetime format is unaccounted for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants