You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In our application, we only care about rows in the first sheet (page). Our code looks like this:
try (InputStream delimFileStream = amazonS3.getObject(s3BucketPath, s3Key).getObjectContent();
RowReader reader = RowReaders.open(delimFileStream, fileName).blockingGet()) {
if (reader == null) {
...stuff...
}
Iterable<Row> rows = reader.rows().takeWhile(row -> row.page() == 0).blockingIterable();
for (Row row : rows) {
...logic to act upon rows in first sheet/page...
}
}
In the attached spreadsheet, there is data in the first sheet, but Tabitha seems to think everything instead starts on page 1 (the second sheet). The code provided above iterates zero times for that reason, which is not what we would expect to happen.
Something to note: after converting this file to XLSX using Microsoft Excel, Tabitha recognizes the page numbers correctly, so perhaps this is an XLS only issue.
The text was updated successfully, but these errors were encountered:
twilco
changed the title
Tabitha incorrectly reporting page number for XLS file
Tabitha reporting incorrect page number for rows in XLS file
Oct 3, 2018
This XLS file given is laid out very strangely in such a way as to confuse the parser. I'd almost call this file corrupt, even.
XLS is laid out as an ordered sequence of binary "records" that indicate various changes and hold data. This particular file has the "begin sheet template" and "begin sheet Values" records before any cell records, which I believe to be incorrect. (That said, XLS has no formal spec, and Excel opens it fine...)
Edit: This file is indeed valid, but laid out in a complex manner that will require Tabitha to do a lot more bookkeeping to parse it correctly. Files formatted in this way could cause rows to be emitted out of order or on a sheet different than the one they are actually located in. This is a bug we want to fix.
Tabitha appears to be assigning the wrong row number to rows in the XLS file found at this URL: https://s3.amazonaws.com/widen-ingester-dev/tabithaDoesn'tSeeFirstPageSanitized.xls
In our application, we only care about rows in the first sheet (page). Our code looks like this:
In the attached spreadsheet, there is data in the first sheet, but Tabitha seems to think everything instead starts on page 1 (the second sheet). The code provided above iterates zero times for that reason, which is not what we would expect to happen.
Something to note: after converting this file to XLSX using Microsoft Excel, Tabitha recognizes the page numbers correctly, so perhaps this is an XLS only issue.
The text was updated successfully, but these errors were encountered: