-
Notifications
You must be signed in to change notification settings - Fork 721
XTP Poland PDF Extrator - Issue 4938 #5190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Nirus2000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello,
Thank you for the pull request! I know importers can be a bit of a pain, which is why we try to adhere to a clear and consistent standard so that changes don't take up unnecessary time later on. If we do it right from the start, everyone will have less stress in the end.
It's a hobby project, and that's exactly where you learn how to implement things cleanly in a collaborative setup. Maybe we'll end up in your projects at some point and get to tidy things up a bit – that would only be fair, right? 😉
It would be great if you could revise a few of the changes. I can take care of the rest later, but it makes it much easier for everyone.
Best regards,
Alex
| .attributes("tickerSymbol", "name") // | ||
| .match("^(\\d+) (\\d+) (?<tickerSymbol>[A-Z0-9\\\\._-]{1,10}(?:\\\\.[A-Z]{1,4})?)\\s+(?<name>.*?),(.*?)\\s+([.,'\\d]+) ([\\d]{2}\\.[\\d]{2}\\.[\\d]{4}) \\w+ ([\\.,'\\d]+) ([\\.,'\\d]+) .* ([A-Z]{3}) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+).*$") // | ||
|
|
||
| .assign((t, v) -> t.setSecurity(getOrCreateSecurity(v))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the getOrCreateSecurity function, you will see that the security is always created with a currency. This does not need to be created separately with setCurrencyCode(), but we have to define it as an attribute “currency” (see AbstractPDFExtractor.java line 184).
| // @formatter:on | ||
| section -> section // | ||
| .attributes("tickerSymbol", "name") // | ||
| .match("^(\\d+) (\\d+) (?<tickerSymbol>[A-Z0-9\\\\._-]{1,10}(?:\\\\.[A-Z]{1,4})?)\\s+(?<name>.*?),(.*?)\\s+([.,'\\d]+) ([\\d]{2}\\.[\\d]{2}\\.[\\d]{4}) \\w+ ([\\.,'\\d]+) ([\\.,'\\d]+) .* ([A-Z]{3}) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+).*$") // |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The standard regular expressions are stored in the Contributing Rules for your reference. You can find them here and feel free to use them, e.g., for the ticker symbol.
https://github.com/portfolio-performance/portfolio/blob/master/CONTRIBUTING.md#regular-expression-reference
| // @formatter:on | ||
| section -> section // | ||
| .attributes("shares") // | ||
| .match("^(\\d+) (\\d+) ([A-Z0-9\\\\._-]{1,10}(?:\\\\.[A-Z]{1,4})?)\\s+(.*?),(.*?)\\s+(?<shares>[.,'\\d]+) ([\\d]{2}\\.[\\d]{2}\\.[\\d]{4}) \\w+ ([\\.,'\\d]+) ([\\.,'\\d]+) .* (?<currency>[A-Z]{3}) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+).*$") // |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(?<currency>[A-Z]{3})
I don't think we need this match here. 😄
| section -> section // | ||
| .attributes("date", "time") // | ||
| .match("^(\\d+) (\\d+) ([A-Z0-9\\\\._-]{1,10}(?:\\\\.[A-Z]{1,4})?)\\s+(.*?),(.*?)\\s+([.,'\\d]+) (?<date>[\\d]{2}\\.[\\d]{2}\\.[\\d]{4}) \\w+ ([\\.,'\\d]+) ([\\.,'\\d]+) .* ([A-Z]{3}) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+).*$") // | ||
| .match("^([A-Z]{3})\\, ([A-Z]{3}) (?<time>[\\d]{2}:[\\d]{2}:[\\d]{2}).*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .match("^([A-Z]{3})\\, ([A-Z]{3}) (?<time>[\\d]{2}:[\\d]{2}:[\\d]{2}).*") | |
| .match("^[A-Z]{3}, [A-Z]{3} (?<time>[\\d]{2}:[\\d]{2}:[\\d]{2}).*$") |
In general... don't create RegEx groups where they are not needed.
The comma does not necessarily have to be invalidated.
| section -> section // | ||
| .attributes("date", "time") // | ||
| .match("^(\\d+) (\\d+) ([A-Z0-9\\\\._-]{1,10}(?:\\\\.[A-Z]{1,4})?) (.*?) ([\\.,'\\d]+) (?<date>[\\d]{2}\\.[\\d]{2}\\.[\\d]{4}) \\w+ ([\\.,'\\d]+) Prawa \\w+ ([A-Z]{3}) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+).*$") // | ||
| .match("^([A-Z]{3})\\, ([A-Z]{3}) (?<time>[\\d]{2}:[\\d]{2}:[\\d]{2}).*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .match("^([A-Z]{3})\\, ([A-Z]{3}) (?<time>[\\d]{2}:[\\d]{2}:[\\d]{2}).*") | |
| .match("^[A-Z]{3}\\, [A-Z]{3} (?<time>[\\d]{2}:[\\d]{2}:[\\d]{2}.*$") |
| // @formatter:on | ||
| section -> section // | ||
| .attributes("note") // | ||
| .match("^(\\d+) (?<note>\\d+) (?<tickerSymbol>[A-Z0-9\\\\._-]{1,10}(?:\\\\.[A-Z]{1,4})?)\\s+(?<name>.*?),(?<system>.*?)\\s+(?<shares>[.,'\\d]+) (?<date>[\\d]{2}\\.[\\d]{2}\\.[\\d]{4}) \\w+ (?<amount>[\\.,'\\d]+) ([\\.,'\\d]+) .* (?<currency>[A-Z]{3}) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+) (?<totalcost>[\\.,'\\d]+).*$") // |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some attributes are too much for this...😄
| .assign((t, v) -> { | ||
| if (v.get("note") != null) | ||
| t.setNote("Order Number: " + v.get("note")); | ||
|
|
||
| }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .assign((t, v) -> { | |
| if (v.get("note") != null) | |
| t.setNote("Order Number: " + v.get("note")); | |
| }), | |
| .assign((t, v) -> t.setNote("Order Number: " +trim(v.get("note"))), |
I think... this is better.
| private <T extends Transaction<?>> void addTaxesSectionsTransaction(T transaction, DocumentType type) | ||
| { | ||
| // not yet implemented. No examples | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we only have fees at first, then simply don't add this part.
| private void addAccountStatementTransaction() | ||
| { | ||
| // Not implemented yet. No examples | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed... so delete it
| @Override | ||
| protected long asShares(String value) | ||
| { | ||
| String language = "en"; | ||
| String country = "GB"; | ||
|
|
||
| int lastDot = value.lastIndexOf("."); | ||
| int lastComma = value.lastIndexOf(","); | ||
|
|
||
| // returns the greater of two int values | ||
| if (Math.max(lastDot, lastComma) == lastDot) | ||
| { | ||
| language = "en"; | ||
| country = "US"; | ||
| } | ||
|
|
||
| return ExtractorUtils.convertToNumberLong(value, Values.Share, language, country); | ||
|
|
||
| } | ||
|
|
||
| @Override | ||
| protected long asAmount(String value) | ||
| { | ||
| String language = "en"; | ||
| String country = "GB"; | ||
|
|
||
| return ExtractorUtils.convertToNumberLong(value, Values.Amount, language, country); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see en_US as convertToNumberLong not en_GB
The other point is that you are using the following regEx for the amounts: [\.,'\d]+
However, the “ ' ” is for currencies from Switzerland or Austria... you should correct this.
Renewed commit