Skip to content

Conversation

@arnonm
Copy link
Contributor

@arnonm arnonm commented Dec 4, 2025

Renewed commit

Copy link
Member

@Nirus2000 Nirus2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,

Thank you for the pull request! I know importers can be a bit of a pain, which is why we try to adhere to a clear and consistent standard so that changes don't take up unnecessary time later on. If we do it right from the start, everyone will have less stress in the end.

It's a hobby project, and that's exactly where you learn how to implement things cleanly in a collaborative setup. Maybe we'll end up in your projects at some point and get to tidy things up a bit – that would only be fair, right? 😉

It would be great if you could revise a few of the changes. I can take care of the rest later, but it makes it much easier for everyone.

Best regards,
Alex

.attributes("tickerSymbol", "name") //
.match("^(\\d+) (\\d+) (?<tickerSymbol>[A-Z0-9\\\\._-]{1,10}(?:\\\\.[A-Z]{1,4})?)\\s+(?<name>.*?),(.*?)\\s+([.,'\\d]+) ([\\d]{2}\\.[\\d]{2}\\.[\\d]{4}) \\w+ ([\\.,'\\d]+) ([\\.,'\\d]+) .* ([A-Z]{3}) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+).*$") //

.assign((t, v) -> t.setSecurity(getOrCreateSecurity(v))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the getOrCreateSecurity function, you will see that the security is always created with a currency. This does not need to be created separately with setCurrencyCode(), but we have to define it as an attribute “currency” (see AbstractPDFExtractor.java line 184).

// @formatter:on
section -> section //
.attributes("tickerSymbol", "name") //
.match("^(\\d+) (\\d+) (?<tickerSymbol>[A-Z0-9\\\\._-]{1,10}(?:\\\\.[A-Z]{1,4})?)\\s+(?<name>.*?),(.*?)\\s+([.,'\\d]+) ([\\d]{2}\\.[\\d]{2}\\.[\\d]{4}) \\w+ ([\\.,'\\d]+) ([\\.,'\\d]+) .* ([A-Z]{3}) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+).*$") //
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard regular expressions are stored in the Contributing Rules for your reference. You can find them here and feel free to use them, e.g., for the ticker symbol.
https://github.com/portfolio-performance/portfolio/blob/master/CONTRIBUTING.md#regular-expression-reference

// @formatter:on
section -> section //
.attributes("shares") //
.match("^(\\d+) (\\d+) ([A-Z0-9\\\\._-]{1,10}(?:\\\\.[A-Z]{1,4})?)\\s+(.*?),(.*?)\\s+(?<shares>[.,'\\d]+) ([\\d]{2}\\.[\\d]{2}\\.[\\d]{4}) \\w+ ([\\.,'\\d]+) ([\\.,'\\d]+) .* (?<currency>[A-Z]{3}) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+).*$") //
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(?<currency>[A-Z]{3})
I don't think we need this match here. 😄

section -> section //
.attributes("date", "time") //
.match("^(\\d+) (\\d+) ([A-Z0-9\\\\._-]{1,10}(?:\\\\.[A-Z]{1,4})?)\\s+(.*?),(.*?)\\s+([.,'\\d]+) (?<date>[\\d]{2}\\.[\\d]{2}\\.[\\d]{4}) \\w+ ([\\.,'\\d]+) ([\\.,'\\d]+) .* ([A-Z]{3}) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+).*$") //
.match("^([A-Z]{3})\\, ([A-Z]{3}) (?<time>[\\d]{2}:[\\d]{2}:[\\d]{2}).*")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.match("^([A-Z]{3})\\, ([A-Z]{3}) (?<time>[\\d]{2}:[\\d]{2}:[\\d]{2}).*")
.match("^[A-Z]{3}, [A-Z]{3} (?<time>[\\d]{2}:[\\d]{2}:[\\d]{2}).*$")

In general... don't create RegEx groups where they are not needed.
The comma does not necessarily have to be invalidated.

section -> section //
.attributes("date", "time") //
.match("^(\\d+) (\\d+) ([A-Z0-9\\\\._-]{1,10}(?:\\\\.[A-Z]{1,4})?) (.*?) ([\\.,'\\d]+) (?<date>[\\d]{2}\\.[\\d]{2}\\.[\\d]{4}) \\w+ ([\\.,'\\d]+) Prawa \\w+ ([A-Z]{3}) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+).*$") //
.match("^([A-Z]{3})\\, ([A-Z]{3}) (?<time>[\\d]{2}:[\\d]{2}:[\\d]{2}).*")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.match("^([A-Z]{3})\\, ([A-Z]{3}) (?<time>[\\d]{2}:[\\d]{2}:[\\d]{2}).*")
.match("^[A-Z]{3}\\, [A-Z]{3} (?<time>[\\d]{2}:[\\d]{2}:[\\d]{2}.*$")

// @formatter:on
section -> section //
.attributes("note") //
.match("^(\\d+) (?<note>\\d+) (?<tickerSymbol>[A-Z0-9\\\\._-]{1,10}(?:\\\\.[A-Z]{1,4})?)\\s+(?<name>.*?),(?<system>.*?)\\s+(?<shares>[.,'\\d]+) (?<date>[\\d]{2}\\.[\\d]{2}\\.[\\d]{4}) \\w+ (?<amount>[\\.,'\\d]+) ([\\.,'\\d]+) .* (?<currency>[A-Z]{3}) ([\\.,'\\d]+) ([\\.,'\\d]+) ([\\.,'\\d]+) (?<totalcost>[\\.,'\\d]+).*$") //
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some attributes are too much for this...😄

Comment on lines 181 to 185
.assign((t, v) -> {
if (v.get("note") != null)
t.setNote("Order Number: " + v.get("note"));

}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.assign((t, v) -> {
if (v.get("note") != null)
t.setNote("Order Number: " + v.get("note"));
}),
.assign((t, v) -> t.setNote("Order Number: " +trim(v.get("note"))),

I think... this is better.

Comment on lines 213 to 216
private <T extends Transaction<?>> void addTaxesSectionsTransaction(T transaction, DocumentType type)
{
// not yet implemented. No examples
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only have fees at first, then simply don't add this part.

Comment on lines 249 to 252
private void addAccountStatementTransaction()
{
// Not implemented yet. No examples
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed... so delete it

Comment on lines 254 to 282
@Override
protected long asShares(String value)
{
String language = "en";
String country = "GB";

int lastDot = value.lastIndexOf(".");
int lastComma = value.lastIndexOf(",");

// returns the greater of two int values
if (Math.max(lastDot, lastComma) == lastDot)
{
language = "en";
country = "US";
}

return ExtractorUtils.convertToNumberLong(value, Values.Share, language, country);

}

@Override
protected long asAmount(String value)
{
String language = "en";
String country = "GB";

return ExtractorUtils.convertToNumberLong(value, Values.Amount, language, country);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see en_US as convertToNumberLong not en_GB

The other point is that you are using the following regEx for the amounts: [\.,'\d]+
However, the “ ' ” is for currencies from Switzerland or Austria... you should correct this.

@Nirus2000 Nirus2000 added the pdf label Dec 4, 2025
@Nirus2000 Nirus2000 marked this pull request as draft December 4, 2025 14:43
@Nirus2000 Nirus2000 added the blocked Blocked due to external dependencies label Dec 4, 2025
@arnonm arnonm marked this pull request as ready for review December 8, 2025 07:47
@Nirus2000 Nirus2000 marked this pull request as draft December 11, 2025 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocked Blocked due to external dependencies pdf

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants