An Open Knowledge Foundation Site
Meet.37: Research Data; Policy, Practice & Platforms

Meet.37: Research Data; Policy, Practice & Platforms

Wednesday, November 23rd at 7pm at Campfire CoWorking Space in Kennedy Town
4/F Cheung Hing Industrial Building, 12P Smithfield, Kennedy Town

Research data: the government data people forget about.
Meet.37 is on research data policy and practice in Hong Kong. Not just relevant to ivory tower academics, like government data it is taxpayer funded and benefits society. In most of the developed world academic research data is increasingly being mandated to be shared via academic research networks and repositories. Hong Kong has been far behind in the matter, but despite this lack of leadership from government (sound familiar?) the individual universities are taking matters into their own hands and are now building platforms for sharing academic papers and data. Much of this is summarized in a paper some of ODHK have recently put together (see the pre-print version in SocArXiv here:

Our special guest this month is David Palmer (see picture of him recently presenting this work in Beijing) who has had a long history in Hong Kong as a Research Data & Records Development Librarian. He has worked at The University of Hong Kong Libraries (HKUL) since 1990, as Systems Librarian, Technical Services Support Team Leader, and Scholarly Communications Team Leader. He is a founding member of the Hong Kong Open Access Committee, and was instrumental in having HKU become signatory to the Berlin Declaration on Open Access in November 2009. He has led in many path-breaking projects, such as the first university in Asia to have all of its thesis collection (25,000) online in fulltext, the first institution worldwide to do an institutional upload of publication data for each researcher into the ResearcherID database, and the creation of author profiles in The Hub for each of HKU’s authors.

Setting the scene, Scott and Waltraut from ODHK will present some of the findings of their “open science” policy paper, before David presents a rare concrete victory for open in Hong Kong – the first CRIS (Current Research Information System) data portal here built upon the HKU Scholars Hub institutional repository. We will then end with a Q&A on what needs to happen next, and how can these lessons be applied to the wider open data ecosystem in Hong Kong.

Program for Meet.37:
Scott Edmunds (ODHK/GigaScience): Open science policies and practices in HK, introducing the ODHK case study studying these practice

Waltraut Ritter (ODHK/Knowledge Dialogues): Innovation potential of open research data.

David Palmer (HKU): From IR to CRIS. Open e-Research from the HKU Scholars Hub.

Q&A: What needs to happen next for Hong Kong research data, and what lessons can be learned for the wider open data ecosystem?

Please come with questions and participate in the Q&A at the end.

23rd November 2016, 7.00pm
Add this event to your Google Calendar.

NOTE: Do not follow Google Maps! Only 5 minutes from Kennedy Town MTR exit A, go UPHILL
Google Maps link:
Thank you to Campfire Collaborative Space for hosting us.

Sign up on our Facebook event.


Hong Kong versus Zika: come fight the “data gap” at #ZikaHackathon

Hong Kong versus Zika: come fight the “data gap” at #ZikaHackathon

Open Data versus the Mosquito
The current global panic about zika can be boiled down a “data gap” issue. Gaps in understanding of why it has started spreading so rapidly now, a gulf in fathoming its effects on pregnant women (evidence linking zika and microcephaly is still only spatio-temporal rather than causational), and gaps in sharing the research data and clinical specimens that will enable the global research community to keep one step ahead of the virus spread. As with Ebola, there has been much frustration of many key players not sharing these materials. Despite the fact that in a life-and-death situation wild speculation and panic fills the vacuum, and closed data risks lives.

All this makes the zika crisis a perfect opportunity to harness the benefits and showcase the utility of open approaches. In particularly open and collaborative efforts using Open Data and Open Source hardware. An international group of makers / hackers / scientists / citizen scientists trying to develop innovative measures against zika, and Open Data Hong Kong have teamed up with MakerBay to join these efforts. Join us at the zika hackathon on the 16th February at MakerBay in Yau Tong (see their event page here). We’ll be linking up with the global google hangout with other zika hackathon participants in Brazil, Australia, Singapore, and beyond. Then discussing and pitching projects where we can contribute from here in Hong Kong. From both of our data hacking and hardware hacking perspectives, and where these different stands of “open” can be combined to produce crowdsourced data collection tools and apps to see if citizens can do better than the supposed experts in filling in these data gaps.

Singapore 1: HK 0 for data driven approachesThe “Asian tiger mosquito” Aedes Albopictus, which is among 60 types of mosquito that can carry the virus if it bites an infected person, is endemic to Hong Kong. The warmer year-round weather and more extreme rainfall patterns we are currently seeing will make the city even more favourable for mosquitoes from the Aedes genus, sparking warnings from local health officials to eliminate breeding areas. On top of the threats of zika, we already have sporadic dengue outbreaks from these vectors, and the Hong Kong government currently has an Oviposition Trap (Ovitrap) screening program to detect the presence of adult mosquitoes. With only 52 locations across Hong Kong selected for the vector surveillance, and the mosquitoes having a roughly 200m range, more than 98% of Hong Kong is currently not covered and there is a need for much more data collection and presentation (the FEHD presenting not very helpful PDFs). Contrasting this with the more dynamic data driven approaches of dengue reporting Singapore uses, Kaggle competitions for West Nile Virus modelling, and Spanish efforts at crowdsourcing tiger mosquito spotting (with no Hong Kong data collected to date) show a few approaches we could follow here.

Are you interested in getting involved and use your creativity to develop innovative technologies and contribute to understand and prevent zika from spreading? Let’s meet up! The event will be co-hosted by Scott from ODHK and Ajoy, Jacky and Nicolas from MakerBay, and efforts will be longitudinal following the ongoing international hackathon efforts. For more see:

Tuesday, February 16th 2016, 6:00pm
Add to:

Location: MakerBay, 16 Sze Shan Street, C1 Yau Tong Industrial Building Block 2, Yau Tong, Kowloon
See this on Google maps.
See this event on Facebook.

UPDATE 23/2/16: MakerBay have a write-up of this event now posted, and you can see the archived livestream below. Thanks to everyone who attended, and keep following to see how the pitched projects develop. Hacking the Human Genome Hacking the Human Genome

12079500_1645907945684346_7736685585039988396_nTaking Open Data to the Final Frontier: The Human Genome

Stephens ZD et al. (2015) Big Data: Astronomical or Genomical? PLoS Biol doi:10.1371/journal.pbio.1002195

How big is your data? Stephens ZD et al. (2015) Big Data: Astronomical or Genomical? PLoS Biol doi:10.1371/journal.pbio.1002195

Genomics (DNA sequencing) is a Big Data science and is predicted by some to soon exceed the demands of all other Big Data domains such as astronomy, streaming video, and social media. Hong Kong is at the forefront of this genomic revolution, local researchers making key breakthroughs in circulating DNA based prenatal testing and cancer diagnostics (also predicted to become a multi-billion dollar industry), and hosting the world’s largest sequencing centre in Tai Po (BGI Hong Kong). As we move towards “precision medicine”, all of us as patients will increasingly need to make informed decisions based on how medicines, treatments and lifestyle choices are interact with our genetic background. Despite that, genomic literacy and understanding of the cutting edge work in this rapidly growing field by the Hong Kong public is very poor, with little local awareness as to what it entails, and how it will be soon impacting upon all of their lives. In an era of “direct to consumer” DNA sequencing pioneered by companies such as 23&Me, millions of people now have access to their genome-scale data. Due to perceived ethical issues there can be legal restrictions to what people can do with it, with many in the healthcare industry feeling people should not be trusted to access to their own data.

 Countering this, there are a growing numbers of people taking matters into their own hands, carrying out genome blogging, and citizen lead genealogically/ancestry work (e.g. this PLOS paper). A new generation of tools and platforms such as OpenSNP and promethease are democratising access, citizens are crowdfunding their own projects, and genomic apps are even appearing on the market. Just this week the new DNA.Land genomic data sharing portal launched, and over 5,000 people have posted their genomic data in the first few days. For interested potential “genome hackers” we have a number of people at the forefront of this open genomics revolution presenting at this meetup, including Fiona Nielsen of DNAdigest and Bastian Greshake of OpenSNP . For a preview of what to expect see these previous events from DNAdigest and this interview with Bastian, . We’ll cover the tools and resources any non-biologist hacker can get started with (R-, python, bioconductor, and the databases you can find data). Demonstrating that the personal genomics era is already here, we’ll also have a prize draw so lucky participants can get their alcohol metabolism genes sequenced and presented through a fun new genomic app not yet on the market.

The event will attempt to address questions such as:

What questions can you ask of your genetic data?
How much can you do as a citizen scientist, what activities are reserved for academic researchers?

Sign up to this event via the eventbright link and please submit any questions or suggestions for topics related to “Hacking the human genome”. For more experienced genomics experts, the meet follows an all-day workshop on “How do I find human genomics data to power my research?“. The event is hosted at MakerBay in Yau Tong, and we’d like to thank Fiona and Cesar for their help and support in setting the event up.

Monday, October 26th, 7:30pm
Add to:

Location: MakerBay, 16 Sze Shan Street, C1 Yau Tong Industrial Building Block 2, Yau Tong, Kowloon
See this on Google maps.
See this event on Facebook.

UPDATE 28/10/15: the great folks at MakerBay did a live stream and we can see the archived video version here.

Oped in SCMP: Open Data to Fix Academic Fraud

Oped in SCMP: Open Data to Fix Academic Fraud

SCMPheaderLast week the Open Science Working Group of ODHK had an Oped in South China Morning Post (SCMP) discussing issue of fighting academic fraud through use of Open Data. This is a particularly topical issue at the moment with recent scandals implicating many academics in Mainland China with large-scale peer-review fraud covered in the Washington Post. With kind permission of SCMP we are posting an updated and extended version of the piece here, and being good Open Data purists include links to much of the source material discussed.

The scandal of scientific impact
The idealized view of science as the curiosity driven pursuit of knowledge to understand and improve the world around us, has been tarnished by recent news of systematic fraud and mass retraction of research papers from the Chinese academic system, and allegations of attempts to game the peer-review system on an industrial scale. With much of our R&D funded through government, we all hope our tax dollars are spent as wisely as possible, and around the world research funders have developed methods of assessing the quality of their funded researchers work.  One of the most widely used metrics to assess researchers is the Journal Impact Factor (JIF), a (proprietary, closed access) service run by Thomson-Reuters, that ranks the academic journals that scientists publish to get credit in. While many countries have tried to broaden their assessment system to take account a more balanced view of a researchers impact, in China the numbers of publications in JIF ranked journals is currently the only activity that researchers are judged by, and huge amounts of money are changing hands (often hundreds of thousands of RMB payment for a single publication in the top ranked journals) through this system.

This biased focus on one metric above all others has directly lead to large scale gaming of the system and a black market of plagiarism, invented research and fake journals. Following from previous exposé’s of an “academic bazaar” system where authorship on highly ranked papers can be bought, Scientific American in December uncovered a wider and more systematic network of Chinese “paper mills” producing ghostwritten papers and grant applications to order, linked to hacking the peer review system that is supposed to protect the quality and integrity of research. The first major fall-out from this has occurred last month, with the publisher BioMed Central (BMC) retracting 43 papers for peer review fraud, the biggest mass-retraction carried out for this reason to date, and increasing the number of papers retracted for this reason by over a quarter. Many other major publishers have been implicated, with the publisher of the worlds largest journal PLOS also issuing a statement that they are investigating linked submissions. It takes a great amount of time and effort employing Chinese speaking editorial teams to investigate and contact all of the researchers and institutions implicated, and BMC should be applauded for doing this and fixing the scientific record so quickly [COI declaration: Scott Edmunds is an ex-employee of BMC, and he and Rob Davidson are collaborating with them through GigaScience Journal].

To get an idea of the types of research uncovered and implicated, it is possible to see the papers retracted last month, and Retraction Watch has covered the story in detail. The Committee on Publication Ethics has also issued a statement. Guillaume Filion in his blog has done some sterling detective work providing insight on the types of papers written by these “paper mills” and “guaranteed publication in JIF journal” offering companies still advertising their services. The likely production-line explosion of medical meta-analysis publications coming from China has been well known for a number of years, but looking at the list of publications retracted by BMC in March shows a worrying introduction of many other research types such as network analysis.

Like in J. B.Priestley’s famous morality tale, An Inspector Calls, any evil comes from the actions or inactions of everyone. On top of the need for better policing by publishers, funders and research institutions, there needs to be fundamental changes to how we carry out research. Without a robust response and fundamental changes to their academic incentive systems there could be long term consequences for Chinese science, with danger that this loss of trust will lead to fewer opportunities to collaborate with institutions abroad, and potentially building such skepticism that people will stop using research from China.

While we are rightly proud of Hong Kong’s highly regarded and ranked universities system (with three Universities ranked in the world top 50), we are not immune to the same pressures. While funders in Europe have moved away from using citation based metrics such as JIF in their research assessments, the Hong Kong University Grants Committee states in their Research Assessment Exercise guidelines that they may informally use it. In practice some of the Universities do follow the practice of paying bonuses related to the impact factor of journals their researcher publish in, leading to the same temptations and skewed incentive systems that have led to these corrupt practices in China. From looking at the list of retracted papers fortunately on this occasion no Hong Kong based researchers were implicated. With our local institutions increasing their ties across the Pearl River through new joint research institutes and hospitals, and these scandals likely to run and run, how much longer our universities can remain unblemished will be a challenge.

Can We Fix it? Yes We Can!
If the impact factor system is so problematic, what are the alternatives? Different fields have different types of outputs, but there are factors that should obviously be taken into account like quality of teaching, and the numbers of students passing on to do bigger and better things. Impact can be about changing policy, producing open software or data that other research can build upon, or stimulating public interest and engagement through coverage in the media. Many of these measures can also be subject to gaming, but having a broader range of “alternative metrics” should be harder to manipulate.  China is overtaking the US to become the biggest producer of published research, but ranks only ninth in citations, so there obviously needs to be a better focus on quality rather than quantity.

The present lack of research data sharing has led to what is being called a ‘reproducibility crisis’, partly fuelled by fraudulent activity but very often just from simple error. This has led to some people estimating that as much as 85% of research resources (funding, man­hours etc) are wasted. Science is often lauded as being a worthy investment for any government because the return to the economy is more than that put in. What benefits could be gained if there was an 85% improvement on that return? How many more startups and innovative technologies could be produced if research was actually re­usable?

There is growing movement from funders across the world to encourage and enforce data management and access, and we at Open Data Hong Kong are cataloguing the policies and experiences of Hong Kong’s research institutions. Sadly, at this stage we seem to be far behind other countries, currently ranking 58th in the global Open Data Index (just falling from 54th earlier in the year). One of the main benefits of open data is transparency, which would have made the current peer review scandal much harder to carry out. It is encouraging that the Hong Kong Government is already promoting release of public sector data through the newly launched Data.Gov.HK portal, but it is clear that our research data needs to be treated the same way. ODHK is the first organization in Hong Kong (and 555th overall) to sign the San Francisco Declaration on Research Assessment that is trying to eliminate the use of journal-based metrics. To help change the skewed incentive systems we would encourage others to join us by signing at:

Scott Edmunds, Rob Davidson and Waltraut Ritter; Open Data Hong Kong.
Naubahar Sharif; HKUST.

See SCMP for the published version of the Oped here: