Skip to content

feat: add OpenDataCapture to ReproSchema converter#108

Open
yibeichan wants to merge 7 commits intomainfrom
feat/opendatacapture-converter
Open

feat: add OpenDataCapture to ReproSchema converter#108
yibeichan wants to merge 7 commits intomainfrom
feat/opendatacapture-converter

Conversation

@yibeichan
Copy link
Contributor

  • Add odc_mappings.py with type mapping functions
  • Add odc2reproschema.py converter module
  • Add 'odc2reproschema' CLI command
  • Add tests for type mappings and CLI

Supports conversion of:

  • Simple fields (string, number, boolean, date)
  • Likert scales (expanded to multiple items)
  • Record arrays (prefixed sub-items)
  • Multiple choice fields

yibeichan and others added 2 commits February 24, 2026 18:38
- Add odc_mappings.py with type mapping functions
- Add odc2reproschema.py converter module
- Add 'odc2reproschema' CLI command
- Add tests for type mappings and CLI

Supports conversion of:
- Simple fields (string, number, boolean, date)
- Likert scales (expanded to multiple items)
- Record arrays (prefixed sub-items)
- Multiple choice fields
@gemini-code-assist
Copy link

Summary of Changes

Hello @yibeichan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the data interoperability capabilities of the system by introducing a robust converter for OpenDataCapture (ODC) instruments to the ReproSchema format. This new feature streamlines the process of integrating ODC-based data collection forms into ReproSchema workflows, ensuring that complex field structures like Likert scales and record arrays are accurately translated. The addition of a dedicated CLI command simplifies execution for users, making the conversion process accessible and efficient.

Highlights

  • New Conversion Utility: Introduced a new module and CLI command to convert OpenDataCapture (ODC) form instruments into the ReproSchema format.
  • Type Mapping Functions: Added dedicated functions for mapping ODC field types (kind, variant) to appropriate ReproSchema input types and XSD value types.
  • Complex Field Handling: Implemented specific logic to correctly convert complex ODC field types, including simple fields (string, number, boolean, date), Likert scales (expanded to multiple items), record arrays (prefixed sub-items), and multiple-choice fields.
  • CLI Integration: Integrated the ODC to ReproSchema conversion as a new command-line interface option, odc2reproschema, allowing users to specify input files, output paths, and instrument names.
  • Comprehensive Testing: Included new unit tests to validate the correctness of the type mapping functions and the functionality of the new CLI command.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • reproschema/cli.py
    • Added a new odc2reproschema command to the CLI for converting ODC instruments to ReproSchema.
  • reproschema/odc2reproschema.py
    • Added a new module containing the core logic for ODC to ReproSchema conversion.
    • Implemented functions for loading ODC instruments, processing options, and converting individual fields.
    • Included specific handling for complex field types such as record arrays and Likert scales.
    • Managed the creation of ReproSchema activity and protocol schemas from ODC data.
  • reproschema/odc_mappings.py
    • Added a new module to define mappings between ODC field kinds/variants and ReproSchema input/value types.
    • Provided utility functions get_odc_input_type, get_odc_value_type, and is_multiple_choice for consistent type translation.
  • reproschema/tests/test_odc2reproschema.py
    • Added new test cases for validating the ODC type mapping functions.
    • Included tests for the odc2reproschema CLI command, verifying successful conversion and output structure.
Activity
  • No specific activity (comments, reviews, or progress updates) was recorded for this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new feature to convert OpenDataCapture (ODC) form instruments into ReproSchema format, along with a CLI command and corresponding tests. The implementation covers various field types, including simple fields, likert scales, record arrays, and multiple-choice fields. The code is well-structured and includes appropriate error handling for file operations and JSON parsing. The type mapping logic is comprehensive, and the CLI integration is seamless. Overall, this is a valuable addition to the project.

However, I've identified a few areas for improvement:

  • Type Conversion Robustness: The process_options function in odc2reproschema.py might silently fail to convert values to their declared XSD type, which could lead to unexpected data types in the output. It would be more robust to explicitly handle such cases.
  • Mapping Fallback Logic: In odc_mappings.py, the fallback logic for get_odc_input_type when a specific variant is not found could be made more explicit to ensure deterministic behavior.
  • Test Coverage for Content: The CLI tests currently verify the existence of generated files but do not assert their content. Adding assertions for the content of the converted ReproSchema files would significantly improve test robustness.
I am having trouble creating individual review comments. Click here to see my feedback.

reproschema/tests/test_odc2reproschema.py (93-104)

high

The current CLI tests only verify the existence of the generated files and directories. To ensure the conversion logic is correct, it would be beneficial to also assert the content of these files. For example, load the generated test_instrument_schema and item1 JSON-LD files and check if key fields like prefLabel, question, inputType, and responseOptions match the expected values based on the test_data fixture.

This would provide much stronger validation of the converter's functionality.

reproschema/odc2reproschema.py (66-75)

medium

The try-except ValueError: pass block for type conversion in process_options can lead to silent failures where a value intended to be an integer or float remains a string if conversion fails. This might result in type mismatches in the generated ReproSchema, which expects strict XSD types.

Consider logging a warning when a value cannot be converted to its declared type, or explicitly ensuring that processed_val is always a string if the conversion fails, to maintain consistency and provide better debugging information.

        processed_val = val
        if "integer" in value_type:
            try:
                processed_val = int(val)
            except ValueError:
                logger.warning("Could not convert value '%s' to integer for type '%s'. Keeping as string.", val, value_type)
                processed_val = str(val)
        elif "decimal" in value_type or "float" in value_type:
            try:
                processed_val = float(val)
            except ValueError:
                logger.warning("Could not convert value '%s' to float/decimal for type '%s'. Keeping as string.", val, value_type)
                processed_val = str(val)

reproschema/odc_mappings.py (80-81)

medium

The fallback if variants: return list(variants.values())[0] is non-deterministic as dictionary iteration order is not guaranteed across Python versions (though it is insertion-ordered in 3.7+). Relying on the 'first' item might lead to unexpected behavior if the dictionary structure or Python version changes. It would be more robust to define an explicit default for each kind if no specific variant or 'None' default is found, or return a generic 'text' type if no better match is available.

        # If no default, return a generic 'text' type as a safe fallback
        if variants:
            logger.debug("No specific variant or 'None' default found for kind '%s', variant '%s'. Returning generic 'text'.", kind, variant)
            return "text"

@yibeichan
Copy link
Contributor Author

@greptileai

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an OpenDataCapture (ODC) → ReproSchema conversion path to the reproschema toolkit, including mapping utilities, a converter module, a new CLI command, and initial tests.

Changes:

  • Added ODC type/input mapping helpers (odc_mappings.py).
  • Added an odc2reproschema converter module that generates activity + protocol outputs.
  • Added odc2reproschema CLI command and a basic CLI+mapping test.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 11 comments.

File Description
reproschema/odc_mappings.py Defines ODC→ReproSchema type/input mapping utilities used by the converter.
reproschema/odc2reproschema.py Implements the ODC JSON conversion into ReproSchema activity/protocol output structure.
reproschema/cli.py Registers a new odc2reproschema click command to invoke the converter.
reproschema/tests/test_odc2reproschema.py Adds tests for mapping helpers and a minimal end-to-end CLI conversion run.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +267 to +271
name = instrument_name or internal.get("name") or input_path.stem
title = details.get("title") or name
description = details.get("description") or ""
version = str(internal.get("edition", "1"))

Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instrument names are used as directory names and schema ids without normalization. Other converters/protocol creation normalize names (e.g. replace spaces with underscores), and create_protocol_schema will also sanitize the protocol name—leading to mismatched activity vs protocol paths if instrument_name contains spaces. Normalize/sanitize name once (strip + replace spaces, and potentially other filesystem-unsafe chars) and use that consistently for both activity and protocol generation.

Copilot uses AI. Check for mistakes.
Comment on lines +141 to +144
# Handle multiple choice
if is_multiple_choice(kind, variant):
item["responseOptions"]["multipleChoice"] = True

Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior: multipleChoice is set when is_multiple_choice(...) returns true, but there’s no test asserting this flag is written for ODC set fields. Add coverage for a set field with options to ensure multipleChoice: true and the generated choices are correct.

Copilot uses AI. Check for mistakes.
return False

kind = kind.lower().strip()
variant = str(variant).lower().strip() if variant else ""
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In is_multiple_choice, the local variant variable is assigned but never used, which will trigger flake8 F841 in this repo. Remove the variant normalization line or incorporate variant into the logic.

Suggested change
variant = str(variant).lower().strip() if variant else ""

Copilot uses AI. Check for mistakes.
import json
import logging
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tuple is imported but never used in this module; with the repo’s flake8 pre-commit hook this will fail CI (F401). Remove the unused import or use it.

Suggested change
from typing import Any, Dict, List, Optional, Tuple
from typing import Any, Dict, List, Optional

Copilot uses AI. Check for mistakes.
from .convertutils import (
create_activity_schema,
create_protocol_schema,
parse_html,
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_html is imported from convertutils but never used, which will fail flake8 (F401). Either remove the import or use parse_html when populating question/description fields (consistent with the other converters).

Suggested change
parse_html,

Copilot uses AI. Check for mistakes.
Comment on lines +127 to +132
item["question"] = {"en": str(label)}

# Add description/hint
description = field_data.get("description")
if description:
item["description"] = {"en": str(description)}
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

label is coerced with str(...) when building question. Elsewhere in the codebase (e.g. NBDC/REDCap converters), labels/descriptions are passed through parse_html to preserve text extraction and language tags. Consider using parse_html(label) or {"en": str(label)} here so ODC HTML/multilingual content isn’t lost.

Suggested change
item["question"] = {"en": str(label)}
# Add description/hint
description = field_data.get("description")
if description:
item["description"] = {"en": str(description)}
parsed_label = parse_html(label)
item["question"] = parsed_label or {"en": str(label)}
# Add description/hint
description = field_data.get("description")
if description:
parsed_description = parse_html(description)
item["description"] = parsed_description or {"en": str(description)}

Copilot uses AI. Check for mistakes.
# Add description/hint
description = field_data.get("description")
if description:
item["description"] = {"en": str(description)}
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

description is currently stored as plain str(...). For consistency with other converters and to handle HTML/language-tagged content, use parse_html(description) (with a plain-text fallback) when populating the ReproSchema description field.

Suggested change
item["description"] = {"en": str(description)}
try:
parsed_description = parse_html(description)
except Exception:
parsed_description = None
item["description"] = parsed_description or {"en": str(description)}

Copilot uses AI. Check for mistakes.
Comment on lines +101 to +103
if kind == "record-array":
return process_record_array(name, field_data)
if kind == "number-record" and variant == "likert":
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior: record-array fields are expanded via process_record_array(...), but the test suite added in this PR doesn’t cover this conversion path. Add a unit/CLI test fixture that includes a record-array field and asserts the expected prefixed sub-items are generated.

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +21
from .odc_mappings import (
ODC_COLUMN_MAP,
get_odc_input_type,
get_odc_value_type,
is_multiple_choice,
)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ODC_COLUMN_MAP is imported but never used, which will fail flake8 (F401). Remove the import, or apply it when translating ODC keys to internal ReproSchema fields.

Copilot uses AI. Check for mistakes.
Comment on lines +97 to +101
kind = field_data.get("kind", "string")
variant = field_data.get("variant")

# Handle special complex fields
if kind == "record-array":
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kind/variant are compared for special cases (record-array, likert, dynamic) before any normalization. If the input uses different casing/whitespace (e.g. "Record-Array"), these branches won’t run and the converter will produce incorrect output. Normalize kind/variant (lower/strip) once up front and use the normalized values for all comparisons and mapping calls.

Copilot uses AI. Check for mistakes.
yibeichan and others added 3 commits February 24, 2026 23:57
- Add sample_odc_instrument.json fixture covering all ODC field types
- Add odc_form_schema.json (ODC JSON schema for reference)
- Add TestComprehensiveInstrument class with:
  - test_comprehensive_instrument_conversion: verifies all 15 field types
  - test_activity_schema_order_and_properties: verifies order/addProperties

All 5 tests passing
Changes:
- Remove unused imports: Tuple and ODC_COLUMN_MAP (F401)
- Remove unused variable in is_multiple_choice() (F841)
- Add logging for type conversion failures in process_options()
- Fix non-deterministic fallback in get_odc_input_type() to return 'text'
- Use parse_html() for labels and descriptions in process_field()
- Normalize instrument name with strip() and replace spaces with underscores
- Validate content is dict before iterating, raise ValueError if not
- Normalize kind/variant at top of process_field() with lower/strip

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yibeichan
Copy link
Contributor Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new feature to convert OpenDataCapture (ODC) instruments to the ReproSchema format. The changes are well-structured, with separate modules for the core conversion logic, type mappings, and a new CLI command. The implementation covers various ODC field types, including simple fields, Likert scales, and record arrays, and handles them robustly. The addition of comprehensive unit and integration tests is excellent and ensures the converter's correctness and reliability. I have a few suggestions for minor refactoring and code cleanup to improve maintainability and clarity, but overall this is a high-quality contribution.

Comment on lines +67 to +86
if "integer" in value_type:
try:
processed_val = int(val)
except ValueError:
logger.warning(
"Could not convert option value '%s' to integer, "
"using string fallback",
str(val),
)
processed_val = str(val)
elif "decimal" in value_type or "float" in value_type:
try:
processed_val = float(val)
except ValueError:
logger.warning(
"Could not convert option value '%s' to float, "
"using string fallback",
str(val),
)
processed_val = str(val)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is significant code duplication in the try-except blocks for converting option values to int and float. This logic can be refactored into a private helper function to improve code clarity and maintainability. For example:

def _try_convert(value, target_type, type_name):
    try:
        return target_type(value)
    except ValueError:
        logger.warning(
            "Could not convert option value '%s' to %s, "
            "using string fallback",
            str(value),
            type_name,
        )
        return str(value)

# ... in process_options ...
processed_val = val
if "integer" in value_type:
    processed_val = _try_convert(val, int, "integer")
elif "decimal" in value_type or "float" in value_type:
    processed_val = _try_convert(val, float, "float")

Comment on lines +194 to +200
item.setdefault("additionalNotesObj", []).append(
{
"source": "odc",
"column": "record-array-parent",
"value": name,
}
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of item.setdefault("additionalNotesObj", []) is redundant here. The process_field function ensures that any returned item will have an additionalNotesObj key with a list value. You can simplify this to a direct key access.

Suggested change
item.setdefault("additionalNotesObj", []).append(
{
"source": "odc",
"column": "record-array-parent",
"value": name,
}
)
item["additionalNotesObj"].append(
{
"source": "odc",
"column": "record-array-parent",
"value": name,
}
)

Comment on lines +52 to +56
ODC_COLUMN_MAP = {
"label": "question",
"description": "description",
"instructions": "instruction",
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The constant ODC_COLUMN_MAP is defined but appears to be unused throughout the codebase. To improve maintainability and remove dead code, it should be removed.

return "xsd:string"


def is_multiple_choice(kind: str, variant: str = None) -> bool:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variant parameter in the is_multiple_choice function is not used. It should be removed from the function signature to simplify the code.

Suggested change
def is_multiple_choice(kind: str, variant: str = None) -> bool:
def is_multiple_choice(kind: str) -> bool:

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +137 to +142
item["question"] = {"en": str(parse_html(label))}

# Add description/hint
description = field_data.get("description")
if description:
item["description"] = {"en": str(parse_html(description))}
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_html can return None if the input is invalid or empty. The current code wraps it with str() which will turn None into "None". After fixing the parse_html usage issue, add a check to ensure parse_html didn't return None before assigning to item["description"].

Suggested change
item["question"] = {"en": str(parse_html(label))}
# Add description/hint
description = field_data.get("description")
if description:
item["description"] = {"en": str(parse_html(description))}
parsed_label = parse_html(label)
if parsed_label is not None:
item["question"] = {"en": str(parsed_label)}
# Add description/hint
description = field_data.get("description")
if description:
parsed_description = parse_html(description)
if parsed_description is not None:
item["description"] = {"en": str(parsed_description)}

Copilot uses AI. Check for mistakes.
)
processed_val = str(val)

choices.append({"name": {"en": str(label)}, "value": processed_val})
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with how question labels are handled throughout the codebase, option labels should use parse_html instead of str(). ODC option labels might contain HTML markup that should be properly parsed. This ensures all user-facing text is processed consistently.

Suggested change
choices.append({"name": {"en": str(label)}, "value": processed_val})
choices.append({"name": {"en": parse_html(str(label))}, "value": processed_val})

Copilot uses AI. Check for mistakes.
raise ValueError(
f"ODC instrument 'content' must be a dictionary, "
f"got {type(content).__name__}"
)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a warning or error if content is empty. An ODC instrument with no fields is unusual and might indicate a malformed input file. Adding a check like if not content: logger.warning("ODC instrument has no content fields") would help users identify potential issues with their input data.

Suggested change
)
)
if not content:
logger.warning("ODC instrument '%s' has no content fields", name)

Copilot uses AI. Check for mistakes.

Returns:
List of ReproSchema item dictionaries
"""
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add validation to check if field_data is a dictionary before processing. If field_data is None or not a dict, the code will fail with an unclear error. Add a check at the start: if not isinstance(field_data, dict): logger.warning("Skipping field '%s': invalid field data", name); return []

Suggested change
"""
"""
if not isinstance(field_data, dict):
logger.warning("Skipping field '%s': invalid field data", name)
return []

Copilot uses AI. Check for mistakes.
Comment on lines +137 to +142
item["question"] = {"en": str(parse_html(label))}

# Add description/hint
description = field_data.get("description")
if description:
item["description"] = {"en": str(parse_html(description))}
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect usage of parse_html. The function already returns a dictionary like {"en": "text"}, so wrapping it in str() and then putting it in a dict creates {"en": "{'en': 'text'}"} instead of the correct {"en": "text"}. Remove str() wrapper and assign parse_html result directly.

Suggested change
item["question"] = {"en": str(parse_html(label))}
# Add description/hint
description = field_data.get("description")
if description:
item["description"] = {"en": str(parse_html(description))}
item["question"] = parse_html(label)
# Add description/hint
description = field_data.get("description")
if description:
item["description"] = parse_html(description)

Copilot uses AI. Check for mistakes.
Comment on lines +228 to +234
row_label = row_data.get("label", row_name)

item = {
"category": "reproschema:Item",
"id": prefixed_name,
"prefLabel": {"en": prefixed_name},
"question": {"en": str(row_label)},
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with process_field and other converters, use parse_html for the row_label instead of directly wrapping with str(). Also check for and handle the optional "description" field in row_data, using parse_html. This ensures HTML content is properly parsed and the data structure is consistent.

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +56
# ODC field properties to internal names
ODC_COLUMN_MAP = {
"label": "question",
"description": "description",
"instructions": "instruction",
}
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ODC_COLUMN_MAP is defined but never used in the codebase. If this mapping is intended for future use or documentation, add a comment explaining its purpose. Otherwise, consider removing it to reduce code clutter.

Copilot uses AI. Check for mistakes.
Comment on lines +137 to +142
item["question"] = {"en": str(parse_html(label))}

# Add description/hint
description = field_data.get("description")
if description:
item["description"] = {"en": str(parse_html(description))}
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect usage of parse_html. The function already returns a dictionary like {"en": "text"}, so wrapping it in str() and then putting it in a dict creates {"en": "{'en': 'text'}"} instead of the correct {"en": "text"}. Remove str() wrapper and assign parse_html result directly.

Suggested change
item["question"] = {"en": str(parse_html(label))}
# Add description/hint
description = field_data.get("description")
if description:
item["description"] = {"en": str(parse_html(description))}
item["question"] = parse_html(label)
# Add description/hint
description = field_data.get("description")
if description:
item["description"] = parse_html(description)

Copilot uses AI. Check for mistakes.
Comment on lines +137 to +142
item["question"] = {"en": str(parse_html(label))}

# Add description/hint
description = field_data.get("description")
if description:
item["description"] = {"en": str(parse_html(description))}
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_html can return None if the input is invalid or empty. The current code wraps it with str() which will turn None into "None". After fixing the parse_html usage issue, add a check to ensure parse_html didn't return None before assigning to item["question"].

Suggested change
item["question"] = {"en": str(parse_html(label))}
# Add description/hint
description = field_data.get("description")
if description:
item["description"] = {"en": str(parse_html(description))}
parsed_label = parse_html(label)
if parsed_label is not None:
item["question"] = {"en": str(parsed_label)}
# Add description/hint
description = field_data.get("description")
if description:
parsed_description = parse_html(description)
if parsed_description is not None:
item["description"] = {"en": str(parsed_description)}

Copilot uses AI. Check for mistakes.
- Add Features section highlighting ODC converter
- Add Usage section with CLI command examples
- Document output structure with visual tree
- Add ODC Field Type Support table
- Add Special Handling notes (Likert, record arrays, multiple choice)
- Add example showing input/output JSON transformation

Addresses PR #108 review feedback for documentation
- Clarify that --instrument-name defaults to internal.name from ODC file
- Make optional status more explicit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants