extracting tables with varying grouping marks (locale issue) #167

AndySibov · 2024-09-11T20:39:56Z

I didn't think there would be a package out there for this, thanks!

I was importing a table where the grouping mark is a dot, with values around 10.000.
As such, the extract_table function returns a double such as 1.950 in the form of 1.95.

best would be to be able to set import option for locale() for grouping marks and such.

below is a function to recover these imported doubles, but it doesn't work for doubles that have all zero's in the decimals (e.g. input 100.00 (from original value 100.000) will result in 100.

recover_double_grouping_mark <- function(value, grouping_mark = '.', interval = 1000) {

dbl_as_char <- as.character(value)

#determine the interval
interval <- log10(interval)

#Vectorized counting of grouping marks for each element in the vector
dot_count <- str_count(dbl_as_char, pattern = paste0('\', grouping_mark))

#Vectorized finding of the position of the first grouping mark and counting digits before it
int_count <- sapply(gregexpr(grouping_mark, dbl_as_char), function(x) min(x) - 1)

#Calculate the difference between expected and actual number of digits for each element
dif_expected_nchar <- ifelse(dot_count > 0,
abs(int_count - (dot_count * interval)),
0)

#Vectorized adjustment of values where there's a mismatch in character length
adjusted_values <- ifelse(dif_expected_nchar > 0,
value * 10^dif_expected_nchar,
value)

return(adjusted_values)
}

pachadotdev · 2024-10-24T04:14:37Z

to fix your trouble check this solution click maybe this will solve your problem.

LOL no

I opened this in a container and it shows this

reported and blocked

pachadotdev · 2024-10-24T04:17:54Z

I didn't think there would be a package out there for this, thanks!

I was importing a table where the grouping mark is a dot, with values around 10.000. As such, the extract_table function returns a double such as 1.950 in the form of 1.95.

best would be to be able to set import option for locale() for grouping marks and such.

below is a function to recover these imported doubles, but it doesn't work for doubles that have all zero's in the decimals (e.g. input 100.00 (from original value 100.000) will result in 100.

recover_double_grouping_mark <- function(value, grouping_mark = '.', interval = 1000) {

dbl_as_char <- as.character(value)

#determine the interval interval <- log10(interval)

#Vectorized counting of grouping marks for each element in the vector dot_count <- str_count(dbl_as_char, pattern = paste0('', grouping_mark))

#Vectorized finding of the position of the first grouping mark and counting digits before it int_count <- sapply(gregexpr(grouping_mark, dbl_as_char), function(x) min(x) - 1)

#Calculate the difference between expected and actual number of digits for each element dif_expected_nchar <- ifelse(dot_count > 0, abs(int_count - (dot_count * interval)), 0)

#Vectorized adjustment of values where there's a mismatch in character length adjusted_values <- ifelse(dif_expected_nchar > 0, value * 10^dif_expected_nchar, value)

return(adjusted_values) }

hi @AndySibov

sorry the late reply, do you have a real link to the PDF

if there are no links, my email is in my description

sorry about the idiot that included a phising link as an answer

AndySibov assigned pachadotdev Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extracting tables with varying grouping marks (locale issue) #167

extracting tables with varying grouping marks (locale issue) #167

AndySibov commented Sep 11, 2024 •

edited

Loading

pachadotdev commented Oct 24, 2024 •

edited

Loading

pachadotdev commented Oct 24, 2024

extracting tables with varying grouping marks (locale issue) #167

extracting tables with varying grouping marks (locale issue) #167

Comments

AndySibov commented Sep 11, 2024 • edited Loading

pachadotdev commented Oct 24, 2024 • edited Loading

pachadotdev commented Oct 24, 2024

AndySibov commented Sep 11, 2024 •

edited

Loading

pachadotdev commented Oct 24, 2024 •

edited

Loading