Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement TPC-H data loader #97

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

1ntEgr8
Copy link
Contributor

@1ntEgr8 1ntEgr8 commented Aug 29, 2024

This patch implements a data loader for TPC-H workloads.

The loader is modeled after data/alibaba_loader.py.

The simulator does not correctly handle the addition of TASK_GRAPH_RELEASE events when the workload is mutated by the loader. The event handler for TASK_GRAPH_RELEASE is only used for logging, so I just commented out the code that adds the event to the queue, punting a proper fix for later.

@1ntEgr8 1ntEgr8 changed the title WIP: Implement TPC-H data loader Implement TPC-H data loader Sep 23, 2024
@1ntEgr8 1ntEgr8 mentioned this pull request Nov 4, 2024
Comment on lines +1542 to +1554
# # Add the TaskGraphRelease events into the system.
# for task_graph_name, task_graph in self._workload.task_graphs.items():
# event = Event(
# event_type=EventType.TASK_GRAPH_RELEASE,
# time=task_graph.release_time,
# task_graph=task_graph_name,
# )
# self._event_queue.add_event(event)
# self._logger.info(
# "[%s] Added %s to the event queue.",
# self._simulator_time.to(EventTime.Unit.US).time,
# event,
# )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should still exist in the Simulator, right?

Comment on lines +333 to +340
# TODO: make configurable
TPCH_SUBDIR = "100g/"
DECIMA_TPCH_DIR = (
"/home/dgarg39/erdos-scheduling-simulator/profiles/workload/tpch/decima/"
)
CLOUDLAB_TPCH_DIR = (
"/home/dgarg39/erdos-scheduling-simulator/profiles/workload/tpch/cloudlab/"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this in the flags please since this won't work across other machines?

Comment on lines +343 to +366
class SetWithCount(object):
"""
allow duplication in set
"""

def __init__(self):
self.set = {}

def __contains__(self, item):
return item in self.set

def add(self, item):
if item in self.set:
self.set[item] += 1
else:
self.set[item] = 1

def clear(self):
self.set.clear()

def remove(self, item):
self.set[item] -= 1
if self.set[item] == 0:
del self.set[item]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this doing anything different from collections.Counter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants