Big Data in Education: Security and Privacy Issues


In the hype behind Big Data it is easy to forget questions of security and privacy. This is a sensitive issue in education as it involves the privacy of minors’ data and in many countries data laws are being updated or in need of changes to adapt to problems posed by data mining.

While Educational Data Management (“EDM”) companies point out that educational apps assisted by data analysis allow children to be treated individually and taught at their own pace, many question if such techniques are necessary as news of data breaches continue to make headlines. Indeed some point out that individualized learning styles have been part of pedagogical approaches since long before mass data analysis techniques were developed.

So as McKinsey reports that data created in the EDM industry has the potential to produce up to $1,180bn globally per annum[1], with consumer products being the only other sector where data has more potential economic value, the question to be asked is if it is possible to take on board improvements from EDM while minimizing the amount of sensitive information put at risk and not compromising teachers’ roles as educators and innovators.

Data Privacy Legislation

In most countries data protection legislation was put in place as floppy disks were coming into vogue and did not foresee a world where, for instance, metadata can be created on individuals for marketing purposes. As a result, many governments have been putting legislation in place in recent years to deal with these new challenges.

In the US amendments have been made to improve the data security provisions of the FERPA Act (1974). According to advocacy group the Data Quality Campaign, 182 data-privacy bills were introduced in 46 states in 2015 alone, with 15 of those states passing 28 laws [2]. Proposed new laws tackle problems arising from data mining and 3rd parties using student data for marketing and add in strict requirements regarding security, encryption, reporting data risk on a regular basis and how and when breaches must be reported.

Meanwhile, in May 2016, the European Union General Data Protection Regulation (GDPR) was adopted, putting in place strict provisions in relation to issues caused by data mining. According to online journal Internet Policy Review[3] the implications for education are generally positive and, although school student data is not explicitly protected, the new legislation explicitly protects children of 18 years or under and their data. However the London-based NGO, Privacy International, claims that amendments to the GDPR were inserted on a “copy and paste” basis directly from lobby papers produced by major US-based IT companies[4].

Data Risk Management

All this new legislation depends heavily upon strong risk management, people’s awareness of their rights and adequate training being put in place to educate staff on risks. Consultancy firm Booz Allen Hamilton[5] claim that many firms simply ensure compliance with legal requirements rather than putting effective risk management in place. Key risk factors to consider include sufficient budgeting and resources and ensuring communication within organisations and with data vendors runs smoothly.

With 3rd party vendors the costs of breaches are a lot more difficult to estimate than when data is kept in house and therefore more difficult to insure[6]. Booz Allen Hamilton also notes that while firms typically risk assess 3rd parties correctly when they appoint them, they claim that many firms do not re-assess or monitor vendors thereafter to ensure data security is maintained. Other areas for improvement include due diligence of agreements with 3rd parties, understanding what information 3rd parties have access to and keeping up to date with information on cyber threats.

But with all these considerations the most important ones may be very basic ones, such as ensuring any use of devices outside of the office adheres to strict security measures, while personal use of devices is limited. Indeed according to the UK’s Information Commissioner’s Office (“ICO”), the largest cause of data incidents in education is theft or loss of unencrypted devices[7].

Effects of Data Breaches

When risk management fails the main questions are what the typical costs of data breaches are, if security is working and if it is cost effective enough. According to an IBM sponsored 2015 study by the Ponemon Institute[8] educational data breaches cost on average $300 per lost or stolen record. However according to an analysis on[9] comparing Ponemon Institute’s numbers with statistics from Identity Theft Resource Center, firms are “spending too much on security that isn’t working”.

The ramifications of data breaches can be more severe than just financial losses. The reputation of an institution or body can be ruined and if data leaked contains sensitive information about pupils’ learning difficulties it could be used for cyberbullying. However higher education tends to be the most vulnerable segment as in most countries universities are not centralized and information contained on students can often be of a more sensitive nature, such as financial details or social security numbers. In the US higher level education accounts for 35% of all educational data breaches[10].

In the European Union such breaches will have significant ramifications as new laws come in place. For instance, when the University of Greenwich reported a breach in February 2016[11], the BBC reported that the maximum fine for such a breach rises from £500,000 to €10m. Irrespective of if the UK exits the European Union such changes are needed as, according to the UK’s ICO, only Health and Local Government sectors have seen more data security issues in the UK than education[7].

Getting these measures to work requires good training of staff and the Data Quality Campaign claim that not enough is being done, particularly to inform employees of the added risks posed by home use and online portals[2]. Indeed the UK’s ICO[12] are recommending that data privacy “should be taught in schools” while even awareness among adults appears to be limited. According to a study by Harvard Business Review[13] most adults, while aware that companies collect data on them, were uninformed of the types of data that becomes publicly available by simply going online, with only 27% realizing they share their social network friends list and 25% realizing they share their location.

How Data Mining Can be Used in Education

Many detractors ask if all of this data analysis is necessary and point out that schools have always used data to make decisions without the need for data mining or creating vast amounts of metadata. However with the growth of Massively Open Online Courses (“MOOC”s), educational apps and with the explosion of internet use and its effect on education in general, it is neither possible nor prudent to ignore EDM.

For instance with MOOCs one might not have tutors or ability to meet educators in person as some of these courses can literally have thousands of students across the globe. Therefore EDM is essential to track progress and offer individualized learning programs. Indeed traditional resources can be combined with EDM where the UK’s Open University offers a range of webinars, online PDFs and exercises, in addition to access to individualized tutors that can ensure students are progressing.

And even in a traditional learning environment, with the influence of the internet and electronic records, there is a lot of data about how we learn being created. Data analysis can be used to spot ways to improve students’ performance, to spot when students are at risk of dropping out and can be used to spot any unusual patterns that may indicate cheating in exams. It can even help with practical matters such as better organisation of schools’ cleaning rotas, tracking uses of facilities and managing effective timetabling and bus routes so that children are refreshed and achieving their potential.

And the patterns revealed by data mining are not easy to spot with observations or hunches. In Wisconsin, in North Middle School[14] administrators noticed a spike in the numbers of students being disciplined for misbehavior. They were able to pin it down to when parents had asked for a removal of incentives “like movies or ice-skating or sledding trips for good behavior” as parents had felt that it took away from learning time. By reinstating these incentives and introducing other new rewards the school saw improvements in behavior and were able to quantify the improvements.

Problems with Using EDM Software

While lawmakers in US and European Union seem keen to prevent data misuse, it is still possible that what is legally held on file on students becomes extensive and that the school transcript hangs over students’ heads in ways it did not do in the past, depending on what data is allowed to be kept on students for a long period of time. The research evidence base on data use in education is limited at the moment and, as noted by the National Education Association in the US, while there is some good research in education out there, at least in US there are a lot of partisan papers where researchers have manipulated data towards results that support their own political viewpoints[15].

And when schools use EDM software problems can arise as many educators will not be as familiar as data analysts with problems in misidentifying patterns, such as confusing correlation and causation or Pareidolia i.e. finding patterns where none exist, or how to deal with such issues to ensure EDM software is used properly. Conversely most analysts that create EDM software may never have taught children and could therefore miss some intangible factors that influence teaching in their algorithms. And it is easy to forget to simply ask students what they think is not working, where without a need for extensive data analytics students may be able to identify better ways of teaching.

How this is dealt with will depend upon how schools in various systems implement such solutions. Schools could use data analytics tools as a black box, but it would appear that a more sensible method would be to use them as an additional tool to take on board in addition to teachers’ existing observations and methods.

Google, Android and Microsoft dominate the market for in-class educational devices with Futuresource Consulting reporting that, as of Q3 2015, of US classrooms using in-class devices over 50% use Google devices, with their main competitors being Apple(24%) and Microsoft(24%)[17]. The presence of such firms in classrooms raises concerns that data could be passed onto third parties for the purposes of data mining and marketing, especially as Google have admitted in the past to data mining student emails[18].

Indeed it is possible to monitor all a child’s movements throughout the day using chips in their bus ticket and cashless lunch transactions, while school walls anonymously show their test scores[19]. As one would expect, parents are finding such measures to be intrusive and, in particular, the EDM company InBloom withdrew their services from the New York State area due to public outcry over using their services to facilitate data tracking[21].

Furthermore the digital news website claims that colleges are using data mining to track applicants[20] to see if their application is a “fallback”, particularly if the college is not an “ivy league” university. It is claimed that they do this by looking at any available information on the student including social media and visits to the college website. While this would appear to add pressure, students may be reassured to learn that “Student’s demonstrated interest” is only one factor under consideration where colleges say metrics such as a student’s grades, curriculum strength, admissions tests and essays take more precedence.


Irrespective of whether or not local schools use EDM technologies, the challenges of the information age cannot be ignored. Therefore continual updating of data privacy laws, prosecution of offenders, educating the public and transparency are a must.

Ideally schools and authorities should assess the use of EDM rather than diving in and using it straight away. EDM has its uses in spotting subtle trends that are not always going to be spotted by hunch and if it is to be adopted data anonymization and security are top priorities.

To that end, efforts should be made to bridge the gap between educators’ role and that of analysts creating EDM software, so that data analysis is used in conjunction with traditional teaching rather than replacing it, and where software vendors constantly communicate with their users to give the best possible performance.


  1. ^Open data: Unlocking innovation and performance with liquid information.
  2. ^ a Why K-12 Data-Privacy Training Needs to Improve.
  3. ^Regulating “big data education” in Europe: lessons learned from the US.
  4. ^MEPs copy-pasting amendments from US lobbyists.
  5. ^Many firms not getting to grips with third-party data security risk.
  6. ^Ashley Madison hack illustrates why third party cyber-liability is as uninsurable as IPR theft.
  7. ^ a Data security incident trends.
  8. ^2015 Cost of Data Breach Study: Global Analysis.
  9. ^The cost of data security: Are cybersecurity investments worth it?
  10. ^Higher Education in the Hit List for Data Breaches.
  11. ^Students hit by University of Greenwich data breach.
  12. ^Data privacy ‘should be taught in schools’.
  13. ^Customer Data: Designing for Transparency and Trust.
  14. ^Some Schools Embrace Demands for Education Data.
  15. ^Awareness of Education Research Methods.
  16. ^Google’s Chromebooks make up half of US classroom devices sold.
  17. ^Google admits data mining student emails in its free education apps.
  18. ^A day in the life of a data mined kid.
  19. ^Privacy Fears Over Student Data Tracking Lead to InBloom’s Shutdown.
  20. ^Colleges are spying on prospective students by quietly tracking them across the internet.

Liam Murray

Authored by:
Liam Murray

Liam Murray is a data driven individual with a passion for Mathematics, Machine Learning, Data Mining and Business Analytics. Most recently, Liam has focused on Big Data Analytics – leveraging Hadoop and statistically driven languages such as R and Python to solve complex business problems. Previously, Liam spent more than six years within the finance industry working on power, renewables & PFI infrastructure sector projects with a focus on the financing of projects as well as the ongoing monitoring of existing assets. As a result, Liam has an acute awareness of the needs & challenges associated with supporting the advanced analytics requirements of an organization.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s