Public datasets like Duke University’s DukeMTMC are often used to train, test, and fine-tune machine learning algorithms that make their way into production, sometimes with controversial results. It’s an open secret that biases in these datasets could negatively impact the predictions made by an algorithm, for example causing a facial recognition system to misidentify a person. But a recent study coauthored by researchers at Princeton reveals that computer vision datasets, particularly those containing images of people, present a range of ethical problems.
This interesting piece reflects on the impact of flawed data sets. Unchecked flawed training sets of data and images can skew the performance of artificial intelligence. If we are not careful “smart” algorithms can replicate biased historical behaviour. We need an approach here that recognises the consequences of machine learning when you don’t pay attention to the ethics of the information we use to train them. It is high time for an ethics framework for machine learning, that balances community benefit and harm, as well as justice considerations. We have included links to 16 related items.
DukeMTMC, LFW, and MS-Celeb-1M contain up to millions of images curated to train object- and people-recognizing algorithms. DukeMTMC draws from surveillance footage captured on Duke University’s campus in 2014, while LFW has photos of faces scraped from various Yahoo News articles. MS-Celeb-1M, meanwhile, which was released by Microsoft in 2016, comprises the facial photos of roughly 10,000 different people.