We are happy to introduce Susan Mani, analyst, data lover and blogger to the Jigsaw team of writers. Susan has a strong background in analyzing marketing strategies, and has extensive experience working with Fortune 500 companies across industries and translating business requirements or problem areas into analytical language. In this her first post she talks to us about the recent controversy at Facebook and tells us to be cautious of the dangers of posting personal information where it can be easily accessed by others.
Ever since the Facebook story broke, the news of the unregulated experiments conducted have evoked two diametrically opposite reactions, one being outrage at the perceived manipulation and the other being complete indifference. In my opinion, you can take the position you think is reasonable but what is key is that you do so in an informed manner.
The whole incident is shrouded in mystery with many specifics still unclear. A former employee of Facebook has spoken out about certain experiments that were designed to manipulate users into coming back. The algorithm allegedly determined the content of news feeds which was person specific, and would have the individual coming back, i.e. fostering addictive behavior of sorts. The exact factors considered and how it was administered remains largely unknown but the revelation has led to heated debates on the ethics of such activities and where lines have to be drawn.
I once saw a post which said that a friend was going to Toys R Us to shop. How can this possibly be used in a meaningful way? Facebook has a team of data scientists who have created a database with extensive information about each of their users. This would include personal data like age, gender, educational data, psychographics i.e. attitudes, interests, likes etc. At times, the posts would have to be converted from unstructured text data into structured data and other times, the data would be in the desired format requiring minimal processing. When the ‘Toys R Us’ post is processed, the variable created could be ‘Bought Toys’ or a similar variable capturing shopping habits. Alternately it could also be linked with data on family size. Nothing is set in stone and it can be modified based on the usage or objective of analysis at hand.
The creation of a rich database consisting of similar parameters for all its users globally would render it a rich data source for all predictive techniques. While not much is known about the exact nature of the controversial analysis, all the new-age companies like Facebook and Google are known to use experiments to gauge user reactions to innovations. At the simplest level it would involve administering the innovation to a limited set of subjects also referred to (in technical parlance) as a test group. The analysis would involve a comparison of their behavior to a different set of subjects (also called a control group) who are similar in all respects to the test group, but have not been exposed to the innovation. Subject to all else remaining constant, the difference in behavior could be considered as a proxy of what would occur if the innovation is rolled out to the wider population. This is a simplistic description of a complex process of what probably took place in the Facebook research team.
Completeness of data has been a huge problem that modelers have faced. Very simply put, it is inability to capture all the factors influencing the subject of analysis. The advent of social media has resulted in a mitigation of this problem driven by the simple fact that the users’ volunteer data and the analysts no longer have to find creative means of covering the information gap. In addition the quality of data is no longer suspect, thanks to the nature of the data generation process. This would greatly enhance the accuracy of the analysis and this underscores why users have to be cognizant of the dangers of posting personal information where it can be accessed by others. While Facebook has been singled out, similar problems exist with other organizations. None of these services are free, and your being active on social media presents a sizeable revenue generation opportunity. Data is described as the new oil and your activity on social media provides a lot of information about you, which can be collated with similar statistics about other users to be used to generate valuable insights to marketers and other interested parties.
While for an analyst like me, this data is like a goldmine, there have to be laws that prevent a gradual decline into anarchy. This problem is not going to disappear unless a comprehensive strategy to tackle it is evolved. The exercise is bound to be complex but regulatory oversight is long overdue.
Susan Mani- All for the love of Data
The opinions expressed here are my own and do not reflect those of my employer.