Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak during newEpisode in data pre-processing #16

Open
GengzeZhou opened this issue Oct 26, 2023 · 3 comments
Open

Memory leak during newEpisode in data pre-processing #16

GengzeZhou opened this issue Oct 26, 2023 · 3 comments

Comments

@GengzeZhou
Copy link

Hi Shizhe,

Thanks for your great work. I have observed a memory leak when calling the newEpisode function in MatterSim when running the data pre-processing code.

def process_features(proc_id, out_queue, scanvp_list, args):
    print('start proc_id: %d' % proc_id)

    # Set up the simulator
    sim = build_simulator(args.connectivity_dir, args.scan_dir)

    # Set up PyTorch CNN model
    torch.set_grad_enabled(False)
    model, img_transforms, device = build_feature_extractor(args.model_name, args.checkpoint_file)

    for scan_id, viewpoint_id in scanvp_list:
        # Loop all discretized views from this location
        images = []
        for ix in range(VIEWPOINT_SIZE):
            if ix == 0:
                sim.newEpisode([scan_id], [viewpoint_id], [0], [math.radians(-30)])
            elif ix % 12 == 0:
                sim.makeAction([0], [1.0], [1.0])
            else:
                sim.makeAction([0], [1.0], [0])
            state = sim.getState()[0]
            assert state.viewIndex == ix

            image = np.array(state.rgb, copy=True) # in BGR channel
            image = Image.fromarray(image[:, :, ::-1]) #cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            images.append(image)

        images = torch.stack([img_transforms(image).to(device) for image in images], 0)
        fts, logits = [], []
        for k in range(0, len(images), args.batch_size):
            b_fts = model.forward_features(images[k: k+args.batch_size])
            b_logits = model.head(b_fts)
            b_fts = b_fts.data.cpu().numpy()
            b_logits = b_logits.data.cpu().numpy()
            fts.append(b_fts)
            logits.append(b_logits)
        fts = np.concatenate(fts, 0)
        logits = np.concatenate(logits, 0)

        out_queue.put((scan_id, viewpoint_id, fts, logits))

    out_queue.put(None)

My memory (64GB) will be gradually taken up when loading new viewpoints, the previously taken memory will not be released. The same issue was raised in the Matterport3D simulator's official repo but no solutions have been provided yet.

This issue is not solved even if I manually add a garbage collection in the for loop:

    for scan_id, viewpoint_id in scanvp_list:
        # Set up the simulator
        sim = build_simulator(args.connectivity_dir, args.scan_dir)

        # Loop all discretized views from this location
        images = []
        for ix in range(VIEWPOINT_SIZE):
            if ix == 0:
                sim.newEpisode([scan_id], [viewpoint_id], [0], [math.radians(-30)])
            elif ix % 12 == 0:
                sim.makeAction([0], [1.0], [1.0])
            else:
                sim.makeAction([0], [1.0], [0])
            state = sim.getState()[0]
            assert state.viewIndex == ix

            image = np.array(state.rgb, copy=True) # in BGR channel
            image = Image.fromarray(image[:, :, ::-1]) #cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            images.append(image)

        images = torch.stack([transform(image).to(device) for image in images], 0)
        fts = []
        for k in range(0, len(images), args.batch_size):
            with torch.cuda.amp.autocast(dtype=torch.float16):
                b_fts = ln_vision(visual_encoder(images[k: k+args.batch_size]))
            b_fts = b_fts.data.cpu().numpy()
            fts.append(b_fts)
        fts = np.concatenate(fts, 0)

        # free memory
        del sim
        gc.collect()

Therefore I believe it is caused by the memory leak in the MatterSim, do you have any suggestions on this issue?

@goodstudent9
Copy link

Hi Shizhe,

Thanks for your great work. I have observed a memory leak when calling the newEpisode function in MatterSim when running the data pre-processing code.

def process_features(proc_id, out_queue, scanvp_list, args):
    print('start proc_id: %d' % proc_id)

    # Set up the simulator
    sim = build_simulator(args.connectivity_dir, args.scan_dir)

    # Set up PyTorch CNN model
    torch.set_grad_enabled(False)
    model, img_transforms, device = build_feature_extractor(args.model_name, args.checkpoint_file)

    for scan_id, viewpoint_id in scanvp_list:
        # Loop all discretized views from this location
        images = []
        for ix in range(VIEWPOINT_SIZE):
            if ix == 0:
                sim.newEpisode([scan_id], [viewpoint_id], [0], [math.radians(-30)])
            elif ix % 12 == 0:
                sim.makeAction([0], [1.0], [1.0])
            else:
                sim.makeAction([0], [1.0], [0])
            state = sim.getState()[0]
            assert state.viewIndex == ix

            image = np.array(state.rgb, copy=True) # in BGR channel
            image = Image.fromarray(image[:, :, ::-1]) #cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            images.append(image)

        images = torch.stack([img_transforms(image).to(device) for image in images], 0)
        fts, logits = [], []
        for k in range(0, len(images), args.batch_size):
            b_fts = model.forward_features(images[k: k+args.batch_size])
            b_logits = model.head(b_fts)
            b_fts = b_fts.data.cpu().numpy()
            b_logits = b_logits.data.cpu().numpy()
            fts.append(b_fts)
            logits.append(b_logits)
        fts = np.concatenate(fts, 0)
        logits = np.concatenate(logits, 0)

        out_queue.put((scan_id, viewpoint_id, fts, logits))

    out_queue.put(None)

My memory (64GB) will be gradually taken up when loading new viewpoints, the previously taken memory will not be released. The same issue was raised in the Matterport3D simulator's official repo but no solutions have been provided yet.

This issue is not solved even if I manually add a garbage collection in the for loop:

    for scan_id, viewpoint_id in scanvp_list:
        # Set up the simulator
        sim = build_simulator(args.connectivity_dir, args.scan_dir)

        # Loop all discretized views from this location
        images = []
        for ix in range(VIEWPOINT_SIZE):
            if ix == 0:
                sim.newEpisode([scan_id], [viewpoint_id], [0], [math.radians(-30)])
            elif ix % 12 == 0:
                sim.makeAction([0], [1.0], [1.0])
            else:
                sim.makeAction([0], [1.0], [0])
            state = sim.getState()[0]
            assert state.viewIndex == ix

            image = np.array(state.rgb, copy=True) # in BGR channel
            image = Image.fromarray(image[:, :, ::-1]) #cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            images.append(image)

        images = torch.stack([transform(image).to(device) for image in images], 0)
        fts = []
        for k in range(0, len(images), args.batch_size):
            with torch.cuda.amp.autocast(dtype=torch.float16):
                b_fts = ln_vision(visual_encoder(images[k: k+args.batch_size]))
            b_fts = b_fts.data.cpu().numpy()
            fts.append(b_fts)
        fts = np.concatenate(fts, 0)

        # free memory
        del sim
        gc.collect()

Therefore I believe it is caused by the memory leak in the MatterSim, do you have any suggestions on this issue?

Don't render image from the simulator, which means set the variable "Render***"(sorry for forgetting the whole spell) to false.
Do as this work, only get angle and connection information from sim will not cause memory leaking.

@jj023721
Copy link

jj023721 commented Oct 17, 2024 via email

@GengzeZhou
Copy link
Author

@goodstudent9 Thanks for your reply. The point here is that I want to render RGB images at any resolution during navigation and also when saving visual features, where memory leaking is observed.
However, according to your answer, the memory leaking could be located in the image rendering process in the simulator. This makes sense because all transformer-based VLN methods (DUET, HAMT, RecBERT, BEVBERT) preload visual features in their code, and they would avoid this problem during training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants