Can’t download the original MS-Celeb-1M dataset? #1

LiuJoffrey · 2019-06-07T15:02:36Z

Hi, I would like to download the original datasets, but I can’t find the download link from the website. Could you please provide the original dataset file or the download link for me. Please.

Thank you so much

ruochunjin · 2019-06-09T22:27:16Z

It is a pity that we have lost the original image data due to our carelessness in data preserving in the last two years. This cleaned file list here is what we have now.
We actually did not know that Microsoft Research has taken down the original data until we see these issues.

jjsjunior · 2019-08-26T17:26:44Z

Hi, I would like to download the original datasets, but I can’t find the download link from the website. Could you please provide the original dataset file or the download link for me. Please.

Thank you so much

Hi Joffrey,
Did you managed to find out a download link for MS-Celeb-1M?
thanks in advance

youthM · 2019-10-14T12:06:02Z

Hi, I would like to download the original datasets, but I can’t find the download link from the website. Could you please provide the original dataset file or the download link for me. Please.
Thank you so much

Hi Joffrey,
Did you managed to find out a download link for MS-Celeb-1M?
thanks in advance

Hi , do you find out a download link for MS-Celeb-1M?

ha1990-12 · 2019-11-14T08:48:51Z

https://academictorrents.com/details/9e67eb7cc23c9417f39778a8e06cca5e26196a97/tech&hit=1&filelist=1

ibarrond · 2021-04-05T19:14:08Z

Hi, how do you process the tsv you get from this torrent? I'm not sure what each column contains or how to process it.

ketan-b · 2021-05-20T09:11:38Z

This should do the task of extracting the images from .TSV

import argparse
import base64
import csv
import os
# import magic # Detect image type from buffer contents (disabled, all are jpg)

parser = argparse.ArgumentParser()
parser.add_argument('--croppedTSV', type=str)
parser.add_argument('--outputDir', type=str, default='raw')
args = parser.parse_args()

with open(args.croppedTSV, 'r') as tsvF:
    reader = csv.reader(tsvF, delimiter='\t')
    i = 0
    for row in reader:
        MID, imgSearchRank, faceID, data = row[0], row[1], row[4], base64.b64decode(row[-1])

        saveDir = os.path.join(args.outputDir, MID)
        savePath = os.path.join(saveDir, "{}-{}.jpg".format(imgSearchRank, faceID))

        # assert(magic.from_buffer(data) == 'JPEG image data, JFIF standard 1.01')

        os.makedirs(saveDir, exist_ok=True)
        with open(savePath, 'wb') as f:
            f.write(data)

        i += 1

        if i % 1000 == 0:
            print("Extracted {} images.".format(i))

ruochunjin added the good first issue Good for newcomers label Nov 22, 2019

ruochunjin pinned this issue Nov 22, 2019

ruochunjin unpinned this issue Apr 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can’t download the original MS-Celeb-1M dataset? #1

Can’t download the original MS-Celeb-1M dataset? #1

LiuJoffrey commented Jun 7, 2019

ruochunjin commented Jun 9, 2019

jjsjunior commented Aug 26, 2019

youthM commented Oct 14, 2019

ha1990-12 commented Nov 14, 2019

ibarrond commented Apr 5, 2021

ketan-b commented May 20, 2021

Can’t download the original MS-Celeb-1M dataset? #1

Can’t download the original MS-Celeb-1M dataset? #1

Comments

LiuJoffrey commented Jun 7, 2019

ruochunjin commented Jun 9, 2019

jjsjunior commented Aug 26, 2019

youthM commented Oct 14, 2019

ha1990-12 commented Nov 14, 2019

ibarrond commented Apr 5, 2021

ketan-b commented May 20, 2021