Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excel to entities #56

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Excel to entities #56

wants to merge 17 commits into from

Conversation

carlosmada22
Copy link
Collaborator

No description provided.

@carlosmada22 carlosmada22 linked an issue Jan 22, 2025 that may be closed by this pull request
Copy link
Contributor

@JosePizarro3 JosePizarro3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! can you add some testing and some excels where to do this?

Also, I added a few minor comments on minor design this you can decide if accept or just keep doing it with your style.

Then, one open point: some code doing checks on the format is commented out. We would actually like to keep this, right? Is this the issue you were having with the logger?

@@ -38,20 +38,44 @@ def cli():
it is using the value of the `OPENBIS_URL` environment variable.
""",
)
def fill_masterdata(url):
@click.option(
"--path",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe be a bit more specific, something like --excel-file?

import re
from typing import TYPE_CHECKING, Any

# import logger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from bam_masterdata.logger import logger

# logger = logger()


def index_to_excel_column(index):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typing and description? I let you decide

return column


def get_last_non_empty_row(sheet, start_index):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typing? again, you decide if whether doing it

I think is:

def get_last_non_empty_row(sheet: Worksheet, start_index: int) -> int:

"Section",
"Property label",
"Data type",
"Vocabulary code",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add "Object code" commented out? This is because in the excel import system is missing, but for our Python and other stuff we need this information when the Data type is OBJECT

codes.append(cell.value)
else:
codes.append(None)
# invalid_codes = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that I see this code, this is not dead code, but actually we should have these checks and print the messages to the logger, right?

entity_type = sheet[entity_type_position].value

entity_types = [
"OBJECT_TYPE",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need both OBJECT_TYPE and SAMPLE_TYPE here?

)
# return "\n".join(errors)
else:
if entity_type == "SAMPLE_TYPE" or entity_type == "OBJECT_TYPE":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for readability, this and the previous ifs in the terms_to_dict could be part of a class and methods applied to each type. There you can identify common abstracted methods to be applied

# Check if we've reached the end of the sheet or found two consecutive empty rows
if last_non_empty_row is None:
if i == len(sheet_names) - 1: # Check if it's the last sheet
print(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger.info(...)

start_time = time.time()

# Define output directory
output_directory = "./artifacts/tmp/" if path else DATAMODEL_DIR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about exporting to os.path.join(DATAMODEL_DIR, "tmp") instead of in artifacts? This way the one creating this temporary files can compare if there are already some python modules under DATAMODEL_DIR

@coveralls
Copy link

coveralls commented Jan 24, 2025

Pull Request Test Coverage Report for Build 12952511664

Details

  • 15 of 369 (4.07%) changed or added relevant lines in 4 files are covered.
  • 144 unchanged lines in 4 files lost coverage.
  • Overall coverage decreased (-1.9%) to 95.36%

Changes Missing Coverage Covered Lines Changed/Added Lines %
bam_masterdata/openbis/get_entities.py 0 2 0.0%
bam_masterdata/cli/cli.py 2 14 14.29%
bam_masterdata/cli/fill_masterdata.py 2 18 11.11%
bam_masterdata/cli/excel_to_entities.py 11 335 3.28%
Files with Coverage Reduction New Missed Lines %
bam_masterdata/utils/utils.py 5 81.82%
bam_masterdata/openbis/get_entities.py 23 18.87%
bam_masterdata/cli/cli.py 25 32.43%
bam_masterdata/cli/fill_masterdata.py 91 7.85%
Totals Coverage Status
Change from base Build 12907785218: -1.9%
Covered Lines: 16381
Relevant Lines: 17178

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Define overwrite and update for excel import-export
3 participants