Case Study on Data Collection: Harvesting Personalities Online

Bryan

2019/01/27

    Raicu wrote a case study for the Santa Clara University discussing the data analytics firm Cambridge Analytica who provided marketing services for various political campaigns (2016). The case study reveals Cambridge Analytica assumed the personality type of nearly all of the registered voters in the U.S. (Raicu, 2016). The study attempts to answer how Cambridge Analytica determined the personality type of 190 million people who were registered to vote across the U.S. (Raicu, 2016). Raicu learned Cambridge Analytica assessed voter personalities through online questionnaires (ads) and obtained approximately 15%+/- participation in the surveys, the remaining voter personalities through statistical methods based on people that had other similar information taken from different sources are estimated (2016). The basis for “similar information” is assumed to be from data brokers and claimed to have a large number of data points on U.S. citizens (Raicu, 2016). The politicians used Cambridge Analytica’s services to market ad campaigns to the American people during the elections. The remainder of this paper will look into more of the details of this case and answer some ethical questions.

    Cambridge Analytica’s method of assigning personalities to individual voters is not a breach of ethics in the sense that the new data point is strictly used one time by serving its purpose to market ads. However, if that personality label is permanently attached to the other identifiable voter data, this is a privacy/ethical issue. They added a data variable to the dataset of voters and is a method called feature engineering in data science. According to Bali, Sarkar, Lantz, & Lesmeister, feature engineering is using knowledge of the domain to make new data variables that can enhance the machine learning algorithm (2016). Cambridge Analytica used domain knowledge from social science to add new categorical variables to the data. According to Talbot, the five new personality categories are openness, conscientiousness, extroversion, agreeableness, neuroticism (2016). The case study said they used the surveyed population to classify the personalities of the rest of the voter population (Raicu, 2016). The ethical issues are potentially how Cambridge Analytica obtained the personality data and categorized other voters that had no idea they had been labeled by personality type in addition to their demographics and the other 5000 data points the case study mentions (Raicu, 2016).

    The personality tests administered to voters online should have had a quickly found privacy statement about who was recording the information, or how long it is stored or used for any other purposes. At a minimum, disclaim that the personality results are matched to the individual’s public records and are data shared, in the future, with any business partners. According to Beckett, Acxiom, Experian, Epsilon, Equifax, Facebook, Datalogix are all data collecting companies, and they have been knowingly collecting and selling people’s personal information (2014). The fact that these companies operate is more alarming than what Cambridge Analytica did to market politicians.

    Cambridge Analytica’s privacy policy has a section about how they use an individual’s information. They say the information is used to improve the website functionality, gain insight on the behavior of target audiences, contact directly or indirectly for marketing and research purposes, and will contact individuals with updates, messages, notices, and anything required by law (Wayback Machine, 2018a). The privacy policy also discusses third-party access to individuals’ information. The disclosure says the recipients of personal data may be clients (political campaigns, expenditure groups, non-profits, and commercial entities), service providers (digital marketing firms, mail vendors, call centers, research partners, data processors, legal counsel), relevant third parties (law, legal, national security) (Wayback Machine, 2017). The interpretation of the privacy policy as a whole appears to disclose fully what Cambridge Analytica does with an individual’s data. The main concern is who exactly are these third parties and how are they excluded from an individual’s right to privacy. It seems as though the third parties have the freedom to use an individual’s information without any bounding privacy code.

    Cambridge Analytica had a case study on their website obtained through Wayback Machine because the website does not exist due to the dissolved business after more recent legal outcomes. In the case study for a global publisher, they specifically say they measure market subscribers “lookalikes, personality for each group, compare and contrast the two groups, provide example psychologic-based communication guidance for the current target” (Wayback Machine, 2018a, para. 1). Also, on the data-driven services for political clients, Cambridge Analytica says they will collect valuable information on voters, opposition, and trends (Wayback Machine, 2018b). Data sources are combined and voter behavior in outlined so political campaigns can persuade people to vote a certain way (Wayback Machine, 2018b). They also state they segment voters into distinct audiences with AI and predictive analytics that uses “behavioral conditioning” of each to create a forecast of future behavior (Wayback Machine, 2018b, para. 4). They disclose what platforms they use such as Facebook, Twitter, connected TV devices, tablets, mobile, etc. (Wayback Machine, 2018b, para. 5). If individuals’ were given Cambridge Analytica’s Privacy Policy with the option to opt-out, the current situation would be different. However, all of these other “parties,” including politicians, social media, and even the “connected TVs” is extremely alarming. Cambridge Analytica states they extrapolate individual personalities were data was missing. Questionable is what happens to this information where it is merged into clients’ datasets as outlined and states “data sources are combined to provide a rich, holistic view of voter behavior” (Wayback Machine, 2018b, para. 3). According to Velasquez, Andre, Shanks, Meyer, & Meyer, the fairness ethics approach asks, “does it treat everyone equally?” (2015, para. 4). Cambridge Analytica’s policies cross over the “Rights Approach” where individuals have a right to privacy, a right to what is agreed, and the right to the truth (Velasquez et al., 2015, para. 3). There is no mention of the privacy policy of where they determined personalities from surveys, let alone notifying the individuals that had their types decided by an algorithm. Then to be distributed to various third parties who never contacted the individuals to inform their information is being used for a social experiment and being persuaded with targeted advertising.

    Velasquez, Andre, Shanks, Meyer, & Meyer discuss moral rights, and “The Rights Approach states people are not objects for manipulation,” and people should be able to make choices (2015). The person has the right to be treated the way they choose, and Velasquez et al. give the right for truth, privacy, agreement, and physical safety as examples of moral actions that respect the individuals’ choices for fair treatment (2015). Taking a person’s data, selling, merging with other data (collected unethically), then targeting the individual (unsolicited) is a violation of “The Rights Approach.” Was the information later sold and who has access to the data, such as the individuals’ employer, banks, credit card companies, or personal accounts at retailers? Selling the combined data from a political campaign could lead to discrimination or other issues.

    Cambridge Analytica had a privacy policy and also stated they are subject to the regulatory enforcement powers of the U.S. Federal Trade Commission under their Privacy Shield framework statement (Wayback Machine, 2017). On their web pages, they say what they do with data collection, and they don’t tell how the predictive analytics classifies individuals into personality groups, which we could deduce by looking at machine learning classification algorithms. The major glaring issue is our government used this data for political purposes, and our personal information is out there and being used by “third parties” who get access because they are business partners. We have a right to the truth, right to privacy, a right not to be injured, and a right to what is agreed (Velasquez et al. 2015). A need to fix the violation of our privacy rights by those that hold, store, transfer, and it’s hard to say by whom because the very politicians that used persuasion also write laws and regulations.

References

Bali, R., Sarkar, D., Lantz, B., & Lesmeister, C. (2016). R: Unleash machine learning techniques. (pp. 279-313). Birmingham, UK: Packt Publishing Ltd.

Beckett, L. (June 13, 2014). Everything we know about what data brokers know about you. Retrieved from: https://www.propublica.org/article/everything-we-know-about-what-data-brokers-know-about-you

Lapowsky, I. (January 25, 2019). One man’s obsessive fight to reclaim his Cambridge Analytica data. Retrieved from: https://www.wired.com/story/one-mans-obsessive-fight-to-reclaim-his-cambridge-analytica-data/

Raicu, I. (May 9, 2016). Data collection: “Harvesting “ personalities online – an ethics case study. Retrieved from: https://www.scu.edu/ethics/focus-areas/internet-ethics/resources/data-collection-harvesting-personalities-online/

Talbot, D. (April 15, 2016). How political candidates know if you’re neurotic. Retrieved from: https://www.technologyreview.com/s/601214/how-political-candidates-know-if-youre-neurotic/

Velasquez, M., Andre, C., Shanks, T., Meyer, S., Meyer, M. (August 1, 2015). Thinking ethically. Retrieved from: https://www.scu.edu/ethics/ethics-resources/ethical-decision-making/thinking-ethically/

Wayback Machine (April 28, 2017). Cambridge Analytica – Privacy policy. Retrieved from: https://web.archive.org/web/20180321032317/https://ca-commercial.com/privacypolicy

Wayback Machine (2018a). Cambridge Analytica – Global periodical publisher. Retrieved from: https://web.archive.org/web/20180321032301/https://ca-commercial.com/casestudies/

Wayback Machine (2018b). Cambridge Analytica – Data driven services. Retrieved from: https://web.archive.org/web/20180321032248/https://ca-political.com/services