70% of Github is duplicate code – Study

A new study has found that around 70% of the code on Github is duplicated, The Register reported.

Researchers originally set out to try and define how much files changed between different clones, but they ended up discovering a very high rate of file-level duplication that caused them to change direction.

Conducted by an international team of eight researchers and led by the University of California at Irvine, the research ultimately found that out of 428 million files on GitHub, only 85 million are unique.

The report stated that these findings have significant implications for research which relies on data from Github, as it would need to take this duplication into account.

Now read: Uber hack shows vulnerability of software code-sharing services

Forum discussion

Join the conversation

70% of Github is duplicate code – Study

Related posts