Nuevas herramientas de secuenciación (RNA seq) para el análisis de
características complejasDNA / Genes
Structural
Juan F. Medrano
Dept of Animal Science mRNA
Genomics
Transcriptome Dept. of Animal Science University of California, Davis
AAAA
Proteins
Transcriptome
AnnotationAnnotationQuantificationOligosaccharides
Metabolism
INIA, Madrid, España Octubre 14, 2011
J.F. Medrano / U.C. Davis
TemasAnálisis del transcriptoma con RNA sequencing
Transcriptoma de la leche a diferentes etapas lactanciaLactación temprana: proteínas de la lecheL t ió t dí i t líti Lactación tardía: enzimas proteolíticas
Características complejas/validación de reguladoresCaracterísticas complejas/validación de reguladoresOligosacáridos de la lecheContenido de citrato en la lecheNutrigenómica estudio en pez zebraNutrigenómica, estudio en pez zebra
J.F. Medrano / U.C. Davis
RNA sequencing procedure
Sample i RNA ti d icollection RNA preparation and sequencing
~220 MRNA extraction
~220 M
Multiplex indexing Millions of readsMultiplex indexingadapter ligation
Tissue Lane 1 Lane 2
Brainstem 20.6 20.9
Cerebral Cortex 17.1 20.7
Hypothalamus 17.3 18.7
Gonadal fat 15.5 14.1
Pituitary 17.7 19.7
J.F. Medrano / U.C. Davis
Pituitary 17.7 19.7
Liver 14.4 16.3Total reads 102.6 110.4
Mapping sequencing reads to exons
Assembled:- to a reference genome
Morozova et al. 2009,
Software used:
Measured by counting sequence reads
Gene expression
Measured by counting sequence reads RPKM value = Reads per kilobase of exon per million mapped reads
J.F. Medrano / U.C. Davis
Gene structureSNP discovery
RNA-Sequence Analysis Workflow I
Sequence analysis
Importing sequence reads and QC
II
Assembly to Reference Genome
De novo assembly
SNP detection SNP discovery and Allelic differential expression
New transcript discovery using unmapped reads
DIP detection RNA-Seq analysis
SNP discovery and Allelic differential expression
Deletions, Insertions analysisTranscriptome (RPKM)Exons/genes discovery
New transcripts Exons/genes discovery in annotated gene regionsSplice variants
Experimental comparison Functional annotation, Blast2Go
III
J.F. Medrano / U.C. Davis
p
Compare multiple samplesTransformation and normalizationStatistical analysis
Pathway Analysis, IPA
GeneMammary RPKM
RNA‐SeqReads
Affymetrix Expression values
CSN2 174686 1351852 14.31
RNA seq vs. Microarray
LGB 129059 737858 14.11CSN3 44255 271151 14.24LALBA 34007 177313 14.08CSN1S1 32345 277713 13.12GLYCAM1 22015 102009 13.92
Highly expressed genes(~180 genes)
Dynamic RangeCSN1S2 14670 120333 13.83MFGE8 4398 42588 12.55FASN 2332 130664 12.67LTF 817 14570 13.33AGPAT6 425 7269 9.47
RPKM: 817 - 174,686Affy 12.5 – 14.3
MUC1 411 4510 10.32SLC29A1 253 4480 8.49CIDEA 185 5352 11.12PTGDS 147 877 10.86TSTA3 125 1356 8.66FOLR1 113 1571 9 66
Medium expressed genes(~6,026)
Dynamic RangeFOLR1 113 1571 9.66BANF1 108 1400 8.90VAT1 93 1852 9.56DAP 86 1522 10.62MST1 1.18 21 3.20FGD1 1 17 29 4 10
y gRPKM: 86 - 425Affy 8.5 – 11.1
FGD1 1.17 29 4.10PTGS1 1.16 25 5.41MORC4 1.16 19 3.56TOR1AIP2 1.14 18 3.01CHRND 1.14 17 4.55ARID3B 1.14 20 3.78
Low expressed genes(~11,024)
Dynamic Range
J.F. Medrano / U.C. Davis
ARID3B . 4 20 3.78TMEM59L 1.13 13 3.58RAMP1 1.12 15 4.17FUT1 1.10 19 3.21
RPKM: 1.10 – 1.18Affy 3.01 – 5.41
RPKM: Reads per kilo base of exon length / million reads
Nature 447:337-42, 2011
~40% of the variance in protein level is explained by
J.F. Medrano / U.C. Davis
mRNA levels. Most of these 40% is due to differences in transcription rate.
Milk transcriptome at different stages of lactation
Experimental comparison
~18,000 of 26,000 genes are expressed~9,000 genes are ubiquitously expressed at all stages
D15 D90 D250Highly expressed 86 140 150>500 RPKM
10 genes represent 61% 11% 19% this % of reads
IPAnalysisD15 D250
IPAnalysis
Milk components antiapoptotic
J.F. Medrano / U.C. Davis
Milk componentsCasein/whey proteinsGlycam1-mucin
antiapoptoticinmmune systemProteolytic enzymes
Gene expression pattern of highly expressed genes at day 15 representing 61% of all sequence reads.
200,000
250,000PK
M
100,000
150,000
Expr
essi
on in
RP
0
50,000
E
DAY 15 Day 90 Day 250
LGB CSN2 CSN1S1 LALBA CSN3 GLYCAM1 CSN1S2
J.F. Medrano / U.C. Davis
Protein % 3.13±0.2Casein % 2.38±0.21
Protein in cow milk remains fairly constant
Milk transcriptome at different stages of lactation
Experimental comparison
~18,000 of 26,000 genes are expressed~9,000 genes are ubiquitously expressed at all stages
D15 D90 D250Highly expressed 86 140 150>500 RPKM
10 genes represent 61% 11% 19% this % of reads
IPAnalysisD15 D250
IPAnalysis
Milk components apoptotic
J.F. Medrano / U.C. Davis
Milk componentsCasein/whey proteinsGlycam1-mucin
apoptoticinmmune systemProteolytic enzymes
RNA-Seq analysis
Proteolytic enzymes in milk: Plasmin (alkaline serum protease)Plasmin (alkaline serum protease)Cathespins (lysosomal proteases)
Role:Mammary development
Microbial interactions
5,000.00
6,000.00
7,000.00
RPK
M
CTSB
CTSD
CTSZ
CTSH
Effect on fermented products and cheese
Sensory quality of milk
2 000 00
3,000.00
4,000.00
ene
expr
essi
on R CTSH
CTSS
CTSC
CTSK
CTSA
y q y
Potential neutraceticals
0.00
1,000.00
2,000.00
Day 15 Day 90 Day 250
Ge
CTSF
CTSW
CTSL2
CTSO
J.F. Medrano / U.C. Davis
SNP discovery in 14 Holstein cows107,639 SNP in coding regions
100
SNP validation with dbSNP
, g gCriteria• Quality score• >10 reads• Min 2 reads/SNP
80
90
100 • No SNP on read ends
(Canovas etalMammGen 2010)
50
60
70
20
30
40
0
10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
J.F. Medrano / U.C. Davis
Validated SNP Unique SNP
Milk oligosaccharide structures
OOH
CH2OHO
CH2OH
[M+ Na]+=732.3
Lacto-N-Neohexose
[M+ Na]+=1097.4
C OOOH
CH2OH
CH OH
Lacto-N-Tetraose
Isomeric fucosylated Lacto-N-Hexose
OH
OH
OH
OOH
OH
OH
CH2OH
NHAc
OH
O
NHAc
OH
CH2OH
OOH
OH
CH2
CH2OH
OH
OH
OH
CH2OH
O
OO
O
O
OOH
OH
OH
CH2OH
O
OH
NHAc
CH2OH
OH
OH
CH2OH
OH
OH
OH
CH2OH
OO
O
y
OH C CH2OH
OOH
OH
CH2OH
O
OH
CH2OH
O
CH2OH
OH
OH
OH
CH2OHOOH
OH
CH2OH
OO
O CH2OHCH2
O
NHAc
CH2OH
O
OOH
OH
OH
CH2OH
O
[M+ Na]+=1389.5
[M+ Na]+=1243.4 Difucosyllacto-N-Hexaose
OHOH
OH
H3C
OOH
OH
CH2OH
OH
O
NHAc
CH2OH OH
NHAcO CH2OH
OH
OH
OH
OOH
OH
O
O
NHAc
CH2OHO
O
OH
OH
OH
H3C
OOH
OH
OH
CH3
O
OH
OH
OH
H3C
O
O
O
Sialic acidGlucosamine
J.F. Medrano / U.C. DavisZivkovic A M , Barile D Adv Nutr 2011
128 genes from 10 functional oligosaccharide metabolism categories in mammals
502 SNP in coding regions
↓Directly genotyped by
RNAseq-
J.F. Medrano / U.C. Davis
Genotyping array↓
Association study
Wickramsinghe et al PloSONE 2011
Non-synonymous SNP in glycosyaltion-realted genes that showed aNon synonymous SNP in glycosyaltion realted genes that showed a damaging effect in the encoded protein (Polyphen analysis)
J.F. Medrano / U.C. Davis
SNP detection Target ValidationPathway analysis
SNP selection (Canovas et al Mamm Genome, 2010)
Marker-trait association studies
Association Analysis
Definition of regulators
Example: genes responsible for variation of CITRATE content in cow milk (130-160mg/100ml).( g )
Citrate in milkInvolved in Ca and P balanceHeat StabilityAid i t i l ti fl dAids in protein coagulation, flavor and aroma Provides protein stabilityPrimary buffer in milk
J.F. Medrano / U.C. Davis
Pathway of fatty acid synthesis in ruminant mammary tissue
NADP
NADPH
J.F. Medrano / U.C. Davis
Numbers in parenthesis correspond to average expression values (RPKM) measured by RNA-seq in milk samples.
Zebrafish muscle tissue response to a plant protein diet♂ n= 440
Average weight = 52 mg Average weight = 228 mg5% 5%
Muscle from 8 males
pool RNA (4 fish/pool)
Muscle from 8 males
pool RNA (4 fish/pool)
2 RNA-seq libraries 2 RNA-seq libraries 17,227 expressed genes
54 differentially d
70 differentialyexpressed genesexpressed genes expressed genes
Low growth fish: proteinsynthesis, cellularmorphology, skeletal and
High growth fish: lipidmetabolism, vitamin and mineral metabolism and p gy,
muscle system development, and tissue morphology.
oxidation reduction.
J.F. Medrano / U.C. Davis
Population fish (24 families)
RNA-seq RNA-seq
5%5%
%
Parents (48 fish)
Four low growth fish/ familyN= 96
Four high growth fish/ familyN= 96
165 SNP / 240 samples
Parents (48 fish)
ID Gen Gene SNP Minor allele
Minor allele frequency
p-value FDR slope Amino acids
ENSDARG000000 N A/T T 0 129 1 60E 05 0 001233 110 1670988 SynonymENSDARG000000 N A/T T 0.129 1.60E‐05 0.001233 ‐110.1670988 Synonym
ENSDARG000000 A T/C T 0.200 0.0033 0.172945 12.7210075 Synonym
ENSDARG000000 P T/A A 0.132 0.0050 0.195037 39.98901273 Synonym
ENSDARG000000 C A/C C 0.031 0.0056 0.17339 ‐134.1560644 Ile500Leu
J.F. Medrano / U.C. Davis
ENSDARG00000045864 Tmod1 G/C C 0.223 0.0061 0.158305 61.38335784 Ser141Thr
Conclusiones
•El workflow analítico de RNAseq aplicado a caracteres complejos es una robusta herramienta para incrementar el conocimiento biológico de los mismos.
- cuantificación precisa del nivel de expresión génica con una lt l ió l i l d t íalta correlación a los niveles de proteína.
- el descubrimiento de nuevos tránscritos- la identificación de nuevos SNP y otras variantes a través de
un completo genotipado del exoma del organismoun completo genotipado del exoma del organismo- permitiendo la identificación de otros organismos presentes
en el material biológico
• La combinación de RNAseq en el análisis de vías metabólicas e identificación de SNP con estudios de asociación es una forma experimental para definir módulos reguladores clave de
J.F. Medrano / U.C. Davis
p p gcaracteres complejos.
Acknowledgements
Medrano Lab
Alma Islas, Gonzalo Rincon, Saumya Wickramasinghe, Pilar Ulloa, Angela Canovas (IRTA, Spain)
UCDavis Genome Center
Colaboradores
Carlito Lebrilla (UCDavis)Bruce German (UCDavis)Rafael Jimenez-Flores (CalPoly, SLO)Armand Sanchez (UAB)Financiamiento
J.F. Medrano / U.C. Davis
Genetic Principles Governing the Rate of
S ll W i h 1939
Genetic Principles Governing the Rate of Progress of Livestock Breeding, JAS 1939
“As a starting point suppose that we were given a reasonably complete map of all of the chromosomes, showing the location of all important genes affecting
Sewall Wright 1939
showing the location of all important genes affecting the character in question as well as of convenient marker genes. What could we do with it?”
J.F. Medrano / U.C. Davis
Sewall Wright 1939