Skip to content

Commit 4d80ca3

Browse files
authored
Type Inference for Excel (#95)
1 parent 28ce931 commit 4d80ca3

File tree

16 files changed

+704
-208
lines changed

16 files changed

+704
-208
lines changed

.devcontainer/devcontainer.json

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,15 @@
1111
"usernamehw.errorlens",
1212
"github.copilot",
1313
"skellock.just"
14-
]
14+
],
15+
"mcp": {
16+
"servers": {
17+
"scautable-metals": {
18+
"type": "sse",
19+
"url": "http://localhost:49625/sse"
20+
}
21+
}
22+
}
1523
}
1624
},
1725
// Features to add to the dev container. More info: https://containers.dev/features.

.github/copilot-instructions.md

Lines changed: 9 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -14,30 +14,23 @@ Scautable is a Scala 3 project using the mill build tool. It is a lightweight da
1414
- `./mill --version` -- Verify mill works (may fail in sandboxed environments due to SSL issues)
1515
- **NEVER CANCEL BUILDS**: Mill compilation takes 2-5 minutes. Tests take 1-3 minutes. ALWAYS set timeout to 10+ minutes.
1616
- Compile all modules:
17-
- `./mill __.compile` -- Compiles all modules (JVM, JS, tests). Takes 3-5 minutes. NEVER CANCEL.
17+
- `./mill __.compile` -- Compiles all modules (JVM, JS, tests). Takes 3-5 minutes cold, fast cached. NEVER CANCEL.
1818
- Compile specific platforms:
1919
- `./mill scautable.js.compile` -- Compile Scala.js target only
2020
- `./mill scautable.jvm.compile` -- Compile JVM target only
2121
- Run tests:
22-
- `./mill scautable.test._` -- Run all tests (JVM and JS). Takes 2-3 minutes. NEVER CANCEL.
22+
- `./mill scautable.test._` -- Run all tests (JVM and JS). Takes 2-3 minutes cold, fast cached. NEVER CANCEL.
2323
- Format code:
24-
- `./mill __.reformat` -- Format all code using scalafmt
24+
- `./mill mill mill.scalalib.scalafmt/` -- Format all code using scalafmt
2525
- Generate documentation:
26-
- `./mill site.siteGen` -- Generate documentation site (takes 1-2 minutes)
27-
28-
## SSL Certificate Issues in Sandboxed Environments
29-
- Mill may fail with `javax.net.ssl.SSLHandshakeException` errors in certain environments
30-
- This is a known limitation of mill's dependency resolution in sandboxed/firewalled environments
31-
- Workarounds attempted: `JAVA_TOOL_OPTIONS`, `COURSIER_*` environment variables (none successful in current environment)
32-
- **CI Environment**: All commands work correctly in GitHub Actions (see `.github/workflows/ci.yml`)
33-
- **Development**: If SSL issues occur, note the limitation and continue with file-based exploration
26+
- `./mill site.siteGen` -- Generate documentation site (takes 1-2 minutes cold, fast cached)
3427

3528
## Validation
3629
- ALWAYS test CSV functionality after making changes to CSV parsing code
3730
- Test both JVM and JS targets when making cross-platform changes
3831
- Run `./mill scautable.test._` to validate all functionality
39-
- Format code with `./mill __.reformat` before committing
40-
- ALWAYS check that examples in `examples/` still compile after core changes
32+
- Format code with `./mill mill.scalalib.scalafmt/` before committing
33+
- Check that examples in `examples/` still compile after core changes
4134

4235
## Project Structure
4336
```
@@ -62,40 +55,17 @@ build.mill -- Root build configuration
6255
3. Add JVM-specific tests in `scautable/test/src-jvm/` if needed
6356
4. Run `./mill scautable.test._` to validate
6457

65-
### Working with Mill Resources
66-
Mill separates compile resources and runtime resources. For CSV files to be available at compile time:
67-
```scala
68-
trait ShareCompileResources extends ScalaModule {
69-
override def compileResources = super.compileResources() ++ resources()
70-
}
71-
```
72-
73-
### Using scala-cli for Development
74-
For quick iteration and testing:
75-
```scala
76-
//> using scala 3.7.2
77-
//> using dep io.github.quafadas::scautable::{{latest_version}}
78-
//> using resourceDir ./csvs
79-
80-
import io.github.quafadas.table.*
81-
82-
@main def testCsv =
83-
val csv = CSV.resource("test.csv")
84-
csv.take(10).toSeq.ptbln
85-
```
86-
8758
## Code Guidelines
8859
- Follow `styleguide.md` for coding conventions
8960
- Use munit for tests. Cross-platform tests go in `scautable/test/src`
9061
- JVM-specific tests go in `scautable/test/src-jvm`
9162
- Use Scala 3 syntax: given/using, extension methods, enum types
92-
- Prefer compile-time type inference for CSV schemas
9363
- Use inline methods for performance-critical code
9464

9565
## Key Dependencies
96-
- Scala 3.7.2
97-
- Mill 1.0.4 (requires Java 21)
98-
- ScalaJS 1.19.0
66+
- Scala 3.7.2+
67+
- Mill 1+ (requires Java 21)
68+
- ScalaJS 1.19.0+
9969
- Munit for testing
10070
- scalatags for HTML generation
10171
- OS-lib for file operations
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
package io.github.quafadas.scautable
2+
3+
/**
4+
* Stub for platform compatibility
5+
*/
6+
class ExcelIterator()

scautable/src-jvm/Excel.scala

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
package io.github.quafadas.scautable
2+
3+
import io.github.quafadas.table.TypeInferrer
4+
5+
/** Main Excel API object providing transparent inline methods for reading Excel files
6+
*/
7+
object Excel:
8+
import ExcelMacros.*
9+
10+
/** Read Excel file from an absolute path with compile-time type inference
11+
*
12+
* @param filePath
13+
* Absolute path to the Excel file
14+
* @param sheetName
15+
* Name of the Excel sheet to read
16+
* @param range
17+
* Optional cell range (e.g., "A1:C10"), empty string reads entire sheet
18+
* @param typeInferrer
19+
* Type inference strategy (StringType or FromTuple supported)
20+
* @return
21+
* ExcelIterator with inferred types
22+
*/
23+
transparent inline def absolutePath[K](filePath: String, sheetName: String, range: String = "", inline typeInferrer: TypeInferrer = TypeInferrer.StringType) =
24+
${ readExcelAbsolutePath('filePath, 'sheetName, 'range, 'typeInferrer) }
25+
26+
/** Read Excel file from the classpath with compile-time type inference
27+
*
28+
* @param filePath
29+
* Path to the Excel file in the classpath
30+
* @param sheetName
31+
* Name of the Excel sheet to read
32+
* @param range
33+
* Optional cell range (e.g., "A1:C10"), empty string reads entire sheet
34+
* @param typeInferrer
35+
* Type inference strategy (StringType or FromTuple supported)
36+
* @return
37+
* ExcelIterator with inferred types
38+
*/
39+
transparent inline def resource[K](filePath: String, sheetName: String, range: String = "", inline typeInferrer: TypeInferrer = TypeInferrer.StringType) =
40+
${ readExcelResource('filePath, 'sheetName, 'range, 'typeInferrer) }
41+
42+
transparent inline def resource[K](filePath: String, sheetName: String, inline typeInferrer: TypeInferrer) =
43+
${ readExcelResource('filePath, 'sheetName, '{""}, 'typeInferrer) }
44+
45+
transparent inline def resource[K](filePath: String, sheetName: String) =
46+
${ readExcelResource('filePath, 'sheetName, '{""}, '{TypeInferrer.FromAllRows}) }
47+
48+
49+
end Excel
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
package io.github.quafadas.scautable
2+
3+
/** Excel-specific decoders that can handle numeric strings like "1.0" These decoders are designed to work with Excel's tendency to format integers as doubles (e.g., "1.0" instead
4+
* of "1")
5+
*/
6+
object ExcelDecoders:
7+
8+
/** Decoder for Int that can handle Excel's numeric formatting Attempts to parse as Double first, then converts to Int if it's a whole number Falls back to regular Int parsing if
9+
* Double parsing fails
10+
*/
11+
inline given excelIntDecoder: Decoder[Int] with
12+
def decode(str: String): Option[Int] =
13+
str.toDoubleOption
14+
.flatMap { d =>
15+
if d.isWhole && d >= Int.MinValue && d <= Int.MaxValue then Some(d.toInt)
16+
else None
17+
}
18+
.orElse(str.toIntOption) // fallback to regular int parsing
19+
end excelIntDecoder
20+
21+
/** Decoder for Long that can handle Excel's numeric formatting Attempts to parse as Double first, then converts to Long if it's a whole number Falls back to regular Long parsing
22+
* if Double parsing fails
23+
*/
24+
inline given excelLongDecoder: Decoder[Long] with
25+
def decode(str: String): Option[Long] =
26+
str.toDoubleOption
27+
.flatMap { d =>
28+
if d.isWhole && d >= Long.MinValue && d <= Long.MaxValue then Some(d.toLong)
29+
else None
30+
}
31+
.orElse(str.toLongOption) // fallback to regular long parsing
32+
end excelLongDecoder
33+
34+
end ExcelDecoders
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
package io.github.quafadas.scautable
2+
3+
/** Exception thrown when there are issues with Excel table structure
4+
*/
5+
class BadTableException(message: String) extends Exception(message)
Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
package io.github.quafadas.scautable
2+
3+
import java.io.File
4+
import scala.NamedTuple.*
5+
import scala.collection.JavaConverters.*
6+
import org.apache.poi.ss.usermodel.{Row, WorkbookFactory}
7+
import org.apache.poi.ss.util.CellRangeAddress
8+
import io.github.quafadas.scautable.BadTableException
9+
10+
/** Iterator for reading Excel files with compile-time type safety
11+
*
12+
* @param filePath
13+
* Path to the Excel file
14+
* @param sheetName
15+
* Name of the Excel sheet to read
16+
* @param colRange
17+
* Optional cell range specification (e.g., "A1:C10")
18+
* @param decoder
19+
* Row decoder for converting string data to typed tuples
20+
* @tparam K
21+
* Tuple type representing column names
22+
* @tparam V
23+
* Tuple type representing column value types
24+
*/
25+
class ExcelIterator[K <: Tuple, V <: Tuple](filePath: String, sheetName: String, colRange: Option[String])(using decoder: RowDecoder[V]) extends Iterator[NamedTuple[K, V]]:
26+
27+
type COLUMNS = K
28+
29+
// Public accessors for compile-time code generation
30+
def getFilePath: String = filePath
31+
def getSheet: String = sheetName
32+
def getColRange: Option[String] = colRange
33+
34+
/** Parses a cell range string into its components
35+
*/
36+
private def parseRange(range: String): (Int, Int, Int, Int) =
37+
val cellRange = CellRangeAddress.valueOf(range)
38+
(cellRange.getFirstRow, cellRange.getLastRow, cellRange.getFirstColumn, cellRange.getLastColumn)
39+
end parseRange
40+
41+
/** Validates that headers are unique (no duplicates)
42+
*/
43+
private def validateUniqueHeaders(headers: List[String]): Unit =
44+
val headerSet = scala.collection.mutable.Set[String]()
45+
headers.foreach { header =>
46+
if headerSet.contains(header) then throw new BadTableException(s"Duplicate header found: $header, which will not work.")
47+
else headerSet.add(header)
48+
}
49+
end validateUniqueHeaders
50+
51+
// Lazy-initialized sheet iterator to avoid opening file until needed
52+
private lazy val sheetIterator =
53+
val workbook = WorkbookFactory.create(new File(filePath))
54+
val sheet = workbook.getSheet(sheetName)
55+
sheet.iterator().asScala
56+
end sheetIterator
57+
58+
// Track current row number for error reporting - starts where data begins
59+
private var currentRowIndex: Int = colRange match
60+
case None => 0
61+
case Some(range) if range.nonEmpty =>
62+
val (firstRow, _, _, _) = parseRange(range)
63+
firstRow
64+
case _ => 0
65+
66+
// Extract headers from the first row or specified range
67+
private val headers: List[String] =
68+
colRange match
69+
case Some(range) if range.nonEmpty =>
70+
extractHeadersFromRange(range)
71+
case _ =>
72+
extractHeadersFromFirstRow()
73+
74+
private lazy val numCellsPerRow = headers.size
75+
76+
// Validate headers are unique at initialization
77+
validateUniqueHeaders(headers)
78+
79+
/** Extract headers from a specified cell range This consumes the header row from the sheet iterator
80+
*/
81+
private inline def extractHeadersFromRange(range: String): List[String] =
82+
val (firstRow, _, firstCol, lastCol) = parseRange(range)
83+
val headerRow = sheetIterator.drop(firstRow).next()
84+
val cells =
85+
for (i <- firstCol.to(lastCol))
86+
yield headerRow.getCell(i, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK).toString
87+
cells.toList
88+
end extractHeadersFromRange
89+
90+
/** Extract headers from the first row of the sheet This consumes the header row from the sheet iterator
91+
*/
92+
private inline def extractHeadersFromFirstRow(): List[String] =
93+
if sheetIterator.hasNext then sheetIterator.next().cellIterator().asScala.toList.map(_.toString)
94+
else throw new BadTableException("No headers found in the first row of the sheet, and no range specified.")
95+
end extractHeadersFromFirstRow
96+
97+
/** Extract cell values from a row based on the column range
98+
*/
99+
private inline def extractCellValues(row: org.apache.poi.ss.usermodel.Row): List[String] =
100+
colRange match
101+
case Some(range) if range.nonEmpty =>
102+
val (_, _, firstCol, lastCol) = parseRange(range)
103+
val cells =
104+
for (i <- firstCol.to(lastCol))
105+
yield row.getCell(i, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK).toString
106+
cells.toList
107+
case _ =>
108+
row.cellIterator().asScala.toList.map(_.toString)
109+
end extractCellValues
110+
111+
override def next(): NamedTuple[K, V] =
112+
if !hasNext then throw new NoSuchElementException("No more rows")
113+
end if
114+
115+
val row = sheetIterator.next()
116+
val cellValues = extractCellValues(row)
117+
118+
// Validate row has expected number of cells
119+
if cellValues.size != headers.size then
120+
throw new BadTableException(
121+
s"Row $currentRowIndex has ${cellValues.size} cells, but expected ${headers.size} cells. Reading terminated."
122+
)
123+
end if
124+
125+
// Decode the row using the provided decoder
126+
val decodedTuple = decoder
127+
.decodeRow(cellValues)
128+
.getOrElse(
129+
throw new Exception(s"Failed to decode row $currentRowIndex: $cellValues")
130+
)
131+
132+
currentRowIndex += 1
133+
NamedTuple.build[K]()(decodedTuple)
134+
end next
135+
136+
override def hasNext: Boolean =
137+
colRange match
138+
case Some(range) if range.nonEmpty =>
139+
val (_, lastRow, _, _) = parseRange(range)
140+
currentRowIndex < lastRow
141+
case _ =>
142+
sheetIterator.hasNext
143+
end hasNext
144+
145+
end ExcelIterator

0 commit comments

Comments
 (0)