Post on 13-Apr-2017
transcript
Reflections on Life Science Data
Infrastructures in Canada
Rome, March 20, 2017 B.F. Francis OuelletteCSO / VP Scientific Affairs, Génome Québec Montréal, QC, Canadafrancis@genomequebec.com
Disclamers• I am an employee of Génome Québec, which is part of the Genome Canada family.• I do not (and will not) profit in any way, shape or form, from any of the brands, products or companies I may mention.• I am a big proponent of Open Access, Open Source, Opent Data and Open Courseware • I am on the SAB of many NIH funded projects (SGD, Galaxy, GenomeSpace, H3ABionet, and HMP2), in addition to Elixir.
in Memory of Anna Tramontano
https://goo.gl/UMBW8f
Outline of my Reflections
• The Parameters: “Made in Canada”• Building a National Bioinformatics Strategy
for Canada• The Cancer Genome Collaboratory• Lessons Learned
Funding Landscape in Canada• 4.5 time-zones, 35 million people, bilingual country, 10
provinces, 3 territories https://en.wikipedia.org/wiki/Canada
• Tri-Council:• CIHR http://www.cihr-irsc.gc.ca/
• NSERC http://www.nserc-crsng.gc.ca/
• SSHERC http://www.sshrc-crsh.gc.ca/
• Genome Canada (GBC-GAlta-GP-OG-GQ-GAtl) https://www.genomecanada.ca/
• CFI / MSI https://www.innovation.ca/awards/major-science-initiatives-fund
• Compute Canada https://www.computecanada.ca/
• CANARIE https://www.canarie.ca/language/
• Network Centres of Excellence http://www.nce-rce.gc.ca/
• Many provincial funding bodies, with budgets more or less proportional to their population size.
… and there is also lots of
DATA to integrate
• ICGC is in the 10-15 PB scale
• Healthcare is as well • Biology is more
complex than particle physics
Why a strategy?
• Genome Canada• CIHR• NSERC• SSHRC
The stakeholders need it
• Life scientist and bioinformatics research communities
• Public and private funding bodies• Infrastructure providers
• HPC • Ultra high-speed digital networks
• Alliances working to coordinate research data• the Canadian population
Why a strategy?
• Genome Canada• CIHR• NSERC• SSHRC
• CFI• Compute Canada• CANARIE• Universities• Private sector
http://www.elixir-europe.org/
18 years in the making
1999 1st Canadian Bioinformatics Workshop delivered
2000 Genome Canada is created
2001 GC & CIHR host 1st Bioinformatics strategic workshop
2003 completion of 1 human genome project
2011 GC & CIHR host 2nd Bioinformatics strategic workshop
2014 GC & CIHR puts together a strategic plan working group
2016 Strategic plan working group delivers final document
2017 Plan is integrated in some of the partner’s plans, and made public
small expert steering committee
SESC + community
workshop
SESC + stakeholder
workshop
Largercommunity
consult
SESC + community
workshop
stakeholder & community
consult
SESC + community
workshop
Finisheddocument
2015
SESC + community
workshop
SESC + writer >Strategic
Plan
Strategic Objective 1: Networking and Coordination
Networking and Coordination
Step 1: Organization a Canadian B/CB meeting.
Step 2: Creation of a Canadian B/CB Society
Step 3: Position the B/CB community for future funding opportunities
http://www.nce-rce.gc.ca/
Strategic Objective 2: Strengthening and Sustaining the B/CB Research Enterprise
Strengthening and Sustaining
the B/CB Research EnterpriseStep 1: Organization of a workshop to bring together the
B/CB researchers funded in the ongoing B/CB-focused initiatives
Step 2: Development of a five-year coordinated plan among funding agencies and infrastructure providers
Step 3: Development of activities/opportunities to support the integration of B/CB professionals and hardware providers into large-scale life sciences projects generating big data and requiring significant data storage and analysis.
Strategic Objective 3: Building Capacity
Building Capacity: Connect, Coordinate and
TrainStep 1: The launch of innovative new graduate and postdoctoral training programs
Step 2: Creation of new training opportunities and salary awards embedded in ongoing large-scale projects
Step 3: Development and promotion of new opportunities for undergraduates
Step 4 Support bioinformatics.ca series.
Bioinformatics.ca workshops Content
http://bioinformatics-ca.github.io/
https://goo.gl/CGu13q
https://goo.gl/CGu13q
Cancer Genome Collaboratory
• Making a sustainable infrastructure for cancer genome research.
• A place to compute and collaborate on human cancer genome data in a secure way.
https://www.cancercollaboratory.org/
Cancer Genome Collaboratory
https://goo.gl/nLlVKf
Cancer Genome Collaboratory
https://www.cancercollaboratory.org/
Cancer Genome Collaboratory
https://www.cancercollaboratory.org/
Cancer Genome Collaboratory
https://www.cancercollaboratory.org/
Cancer Genome Collaboratory
https://www.cancercollaboratory.org/about-collaboratory
( ICGC …
• ICGC to collect:• DNA, RNA, methylomes
and clinical data from 50 different tumour types.
• 500 tumour/normals per tumour type
• 25,000 (50,000) genomes
• 1:10 (whole genome: exam)
• SNV, CNV, SV, germline
…ICGC)
https://dcc.icgc.org/
Deliverable for PCAWG will include:
• 1st PANCANCER analysis on > 2,800 cancer tumours from a WGS perspective
• RNA, SSM, CNV, Methylation analysis & germline• Published (executable) pipelines• Docker / Dockstore• Mutiple cloud access to data• Multiple portal access to data• Many paper (being written & submitted now!)
PCAWG
Cancer Genome Collaboratory
http://dockstore.org
PCAWG pipelines on Dockstore
Lessons Learned (1/2)
• Be patient• You need to publish your “stuff”
(“to make publicly or generally known” https://goo.gl/SgSV6R).
• Publish your tools, SOPs, workflows, pipelines.• Virtualization of services, tools and resources• Shared APIs• Good infrastructure is critical, but good data
even more so.
Lessons Learned (2/2)• Important to establish great tools and databases,
but even more important to maintain them long term.
• Lack of funding in Canada for maintenance of a resource (database) and the maintenance of a tool (service).
• Training is critical, and you cannot have enough of it. We all need to do it (every country, every language).
• Long term support• Do all this, and then tweet about it!
Lessons Learned (2/2)• Important to establish great tools and databases,
but even more important to maintain them long term.
• Lack of funding in Canada for maintenance of a resource (database) and the maintenance of a tool (service).
• Training is critical, and you cannot have enough of it. We all need to do it (every country, every language).
• Long term support• Do all this, and then tweet about it!
Acknowledgements
B/CB Advisory CommitteeGary Bader, U of TorontoRobert Beiko, Dalhousie U.Guillaume Bourque, McGill U.Fiona Brinkman, SFUMichael Brudno, U of TorontoLiz Conibear, UBCBill Crosby, U WindsorMark Dietrich, Compute CanadaFrancis Ouellette, OICRPeter Wilenius, CANARIE
Cancer Genome CollaboratoryLincoln Stein, U of TorontoGuillaume Bourque, McGill U.Paul Boutros, U of TorontoKhaled el Emam, U of OttawaVincent Ferretti, OICRBartha Knoppers, McGill U.Francis Ouellette, U of TorontoCenk Sahinalp, SFUSohrab Shah, UBChttps://goo.gl/3wsGui