如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
1、Lesson23IntroductiontoBioinformatics(第二十三课第二十三课生物信息学简介生物信息学简介)Vocabulary(词汇)ImportantSentences(重点句)QuestionsandAnswers(问答)Problems(问题)Bioinformaticsistheintersectionofmolecularbiologyandcomputerscience.Forsoftwaredevelopers,itsafascinatingandchallengingareainwhichtowork.I
2、nthisarticle,Iwanttointroducethisexcitingfieldandsetthesceneforthearticlesthatwillfollow.1WhatisBioinformaticsWhenmolecularbiologistsstartedtogenerateDNAsequencedata27yearsago,itwasnaturalthatcomputerscientistsandmathematicianswouldtakeakeeninterest.Here
3、inthemessy,wet,analogworldofbiologywasdigitalinformation:alinearstringoffourchemicalgroupsencodingtheentireblueprintsfortheproteinmachineryofthelivingcell.Howcouldyounotbeinterestedincrackingthatcode1Thisfieldofstudygainedarealidentity,andthenamebioinfo
4、rmatics,inthemid-1980s,asDNAsequencingbecameafundamentaltoolformolecularbiologyandsequencedatastartedtoappearinsignificantvolume.Rightfromthestart,threeconceptsemergedthatremaincentraltobioinformaticstoday.Thefirstisdatarepresentation.TheDNAinthehumangenomeis
5、notneatlyarrangedinthepristinedoublehelixweallrecognize.Itiscoatedwithproteinsthatbindtospecificsequences,whichuntwistthehelixtoallowgeneexpressionandwinditupintotightlypackedsupercoils.Farfrombeingastaticarchiveofblueprints,DNAisacomplex,dynamic,three-di
6、mensionalmolecule.AndyetwerepresentallofthisasasimplestringofthecharactersA,C,GandT(Fig.1).Fig.1DNARepresentationbycharactersThisisaremarkableabstraction.Mostoftheprocessesinvolvinggenesthatweknowabouthavebeendiscoveredusingthisgrosslysimplifiedrepresentati
7、onofreality.Itistheperfectrepresentationforcomputeranalysis,andwithoutitwecouldneverhaveapproachedaprojectonthescaleofthehumangenome.2Secondistheconceptofsimilarity.Evolutionhasoperatedoneverysequencethatweseetoday.Itconservesgenesthatencodeimportantprote
8、insandsequencesthatareinvolvedingeneregulation.Sequencesthatencodeusefulfunctionsaretransferred,likecodemodules,fromoneorganismtoanother.Becauseofevolution,similarsequenceshavesimilarfunctions.Algorithmsforcomparingsequencesandfindingsimilarregionsareatthehearto
9、fbioinformatics.Atmanydifferentlevels,theyareusedtofindgenes,determinetheirfunctions,studytheirregulationandassesshowthey,and,entiregenomeshaveevolvedovertime.Thirdistherealitythatbioinformaticsisnotatheoreticalscience;itisdrivenbythedata,whichinturnisdriven
10、bytheneedsofbiology.Relativelyfewresearchershavetheluxurytodevelopalgorithmsandtheoriesinthetraditionalacademicsense.Mostpeoplearefullyconsumedintheday-to-daymanagementandanalysisofdata.Wehavealotofdata.TheintroductionofautomatedDNAsequencingintheearly199
11、0screatedwhatwas,atthetime,atorrentofsequencedata.ButitwastheHumanGenomeProject,withitsmassiveautomation,productionlines,andmoney,thatreallyopenedthefloodgatesinthepastfewyears.3ComparetherateofgrowthofsequencedatainGenBank,theNIHsequencedatabase,toMooresL
12、aw,thatwell-knownmeasureoftechnicaladvancement,andyouwillappreciatethechallenge,facingbiology(Fig.2).Fig.2ComparingtherateofgrowthofsequencedatainGenBanktoMooresLawAndthosearejustthesequences!Microarraytechnologies,abletomeasuretheexpressionofthousandsofgenesin
13、asingleexperiment,havedevelopedoverthepastdecadeandnowproducehugeamountsofdata.Newtechniquesforlookingatgeneticvariationsinlargehumanpopulations,andforidentifyinginteractionsbetweensetsofproteinsincells,arepouringdataontofileserversaroundtheworld.Bioinformat
14、icsischargedwithmanagingandmakingsenseofallofthedata,keepingpacewithbothdataproductionandtechnologydevelopment.Theresplentyofworktogoaround.42WhataretheHotTopicsinBioinformatics2.1ComparativeGenomicsThishasthehighestprofile,thankstotheHumanGenomeProject.
15、Ithasbeen,andstillis,thefocusofahugeamountofwork.Thefirst“tier”ofgenomesequences(human,rat,mouse,andfruitfly)isnowcomplete,andthebigsequencinglabsaremovingontoorganismslikethechimpanzee,rhesusmacaque,cow,chicken,andseaurchin.Fig.3EditedscreenshottakenfromtheUC
16、SantaCruzGenomeBrowser(genome.ucsc.edu)WhythishugeefforttosequencetheentirecontentsofthezooComparativegenomics:thesameapproachtobiologyusedbyCharlesDarwin,butbasedonsequencesinsteadofthebeaksoffinches.Bycomparingthegenomesofrelatedspecies,wecanlearnatremend
17、ousamountabouthowgenomesareorganizedandhowmajorevolutionarychangestakesplace.Atthelevelofindividual,geneswecanuncovernovelmechanismsforregulationthatwerehiddenwhenwejusthadonesequencetoworkwith.Similarityiseverything!2.2SingleNucleotidePolymorphisms(SNPs)Anot
18、heravenuethatopensuponcewehavea“reference”humangenomeisthestudyofsequencedifferencesbetweenindividualsthefinedetails:whatmakesyoudifferentfromme.Itturnsoutthatthegenomeisfullofsinglenucleotidedifferences,calledpolymorphisms,orSNPsforshort.Mostofthesehaveno
19、directimpactonanything.Buttheirdistributionthroughoutthegenome,theirfrequencyinthehumanpopulation,andtheirpatternsofinheritancemakethemextremelyusefulmarkersfordifferencesthatwedocareabout.BymeasuringsetsofSNPsinthousandsofindividualsandcorrelatingthemwithth
20、eincidenceofadisease,wecanidentifywhichregionsofthegenomeareinvolvedandeventuallypinpointthegenesthemselves.Thecombinationofthesemolecularassayswithlargeclinicalstudiesofpopulationsgenerateshugeamountsofdataandawholenewsetofchallengesforbioinformatics.2.3
21、MicroarraysMicroarraytechnologiesshowuswhichgenesareturnedonindifferentcelltypesindifferentcircumstances(Fig.4).Inresponsetoinfection,forexample,certaincelltypeswillexpresssetsofgenesandsynthesizecertainproteinsthatrespondtothestress.MessengerRNA(mRNA)islikeaph
22、otocopyofablueprintthatisusedintheshoptobuildaspecifictypeofprotein.Inamicroarray,wecanattachsequencesfromarangeofgenestoaglassslideinaseriesofdots,andthenbindthemRNAextractedfromapopulationofcellsandmeasurehowmuchbindstoeachdot.Thatgivesusasn
23、apshotofwhichgenesarebeingexpressedatanygiventime.ComparethepatternsformRNAfrom,forexample,normalbreasttissueandfromabreasttumor,andyoucanidentifyproteinsthatareonlypresentinthetumor.Thoseproteinsarepotentialtargetsforcancertreatments,vaccines,andotherthera
24、peutics.Fig.4Sectionofamicroarrayimage,courtesyofEricJeffery,CorixaCorporation2.4SystemsBiologyThegenomegivesusallofthegenesinanorganism,andmicroarraystelluswhichsubsetisexpressedinaparticularbiologicalprocess.Nowthebottleneckinunderstandingbiologyisshiftingt
25、otheworldofproteinsandtheinteractionsbetweenthem.Thetraditionalapproachofdissectingoutindividualinteractionswiththehelpofmutationsandinhibitorsjustdoesntscale.Thatiswheresystemsbiologycomesinwithaslewofnoveltechnologiesaimedatseeingthebigpictureofeveryth
26、inggoingoninacell.Newadvancesinmassspectrometryhaveallowedthisestablishedchemicalanalysistechnologytoidentifythecomponentsofcomplexmixturesofproteins.Inventivechemicallabelingtechniquesprovideinsightintothetransientinteractionsbetweendifferentproteinsinthecell
27、.Thisbundleofnewtechnologiesiscalledproteomics.Theintegrationofalloftheseresultswithgeneexpressiondataandthecollectiveknowledgeofcellbiology,containedinthescientificliterature,becomesanotherhugechallenge.Thisisleadingtoexcitingworkintextualanalysis,pathwaymode
28、ling,andnetworkvisualization.2.5StructuralBiologyWhileourabstractionoftheDNAsequenceworksremarkablywell,intheworldofproteinsthenuancesofthree-dimensionalstructureareeverything.StructuralbiologistsdeterminethestructureofproteinsusingX-raycrystallographyandnuclearma
29、gneticresonance,aslewofheavynumericalmethods,andalotofcomputing.Thisisahugefieldinitsownrightthatpredatesbioinformaticsbyseveraldecades.Itfocusesonthedetailsofstructure,thedynamicsofmolecularmotion,andthespecificinteractionswithdrugsandotherproteins(Fig.5).B
30、ioinformatics,withitsfocusonhugevolumesofdata,hasoftenhadanuneasyinterfacewithstructuralbiology;“quantityversusquality”somemightsay,butthatdistinctionisbecomingeverymoreblurredasallofthesedatasourcesbecomemoreintegrated.Fig.5Nitrogenasestructure1CP2displayedin
31、MacPyMol()2.6SoftwareinBioinformaticsTwomainfactorshaveshapedthecurrentlandscapeofbioinformaticssoftware.Asalreadymentioned,thefieldhasbeendrivenbythemassiveamountofdataandtheresearchprojectsthatgenerateit.Asaresult,mostpeopleinbioinformaticsworkonveryfocuse
32、dprojectsandfewhavetheluxurytositbackandwritetheidealprogramforgeneprediction,forexample.Inaddition,thetechnologiesusedinthelab,andthedatatheyproduce,haveevolvedveryrapidly.Thathasmadeitverydifficulttocommitalotofresourcesandtimetospecificpiecesofsoft
33、ware.Thelifespanofasoftwareprojectisoftenquiteshortandtheleadtimebeforedeploymentisminimal.Beingabletounderstandtheessenceofaproblemandhackupaquicksolutionthatgetsthejobdonearecriticalskillsforagoodbioinformaticsdeveloper.Aclassicexampleisthegenomeas
34、semblerwrittenbyJimKentatUCSantaCruz.Excellentsoftwarealreadyexistedforassemblingthefragmentsofdataproducedbysequencinginstrumentsintolargeblocks,butitcouldnothandlethescaleofthetaskthattheHumanGenomeProjecthadcreated.Ratherthantrytomodifyexistingcode,it
35、madesenseforKenttostartfromscratchandbuildsomething,inveryshortorder,thatwastailoredtothetaskathand.Morethanaquickhack,butalotlessthanacomplete,polishedproduct,Jimssoftwareassembledthehumangenome.Refined,maturesoftwarepackagesusuallyemergefromresearchgroup
36、swithadirectbioinformaticsfocus,asopposedtoplayingasupportrolein,say,agenomecenter.Ofallofthesoftwareoutthere,the“killerapp”inbioinformaticshastobeBLAST,thesuiteofsequencecomparisontoolsfromNCBI,theNationalCenterforBiotechnologyInformationattheNIH.TheBLASTt
37、eambuiltaveryfastsequence-comparisonenginethatcouldsearchtheentirecontentsofGenBankinseconds.Overtheyears,theyhaveimprovedperformanceandextendedtheiralgorithms,buthavealwaysretainedtheirfocusonwhattheydowell.Asaresult,everymolecularbiologistthathaseverlooke
38、datasequencehasusedtheNCBIBLASTserver.1.bioinformaticsn.生物信息学。2.DNA(Deoxyribonucleicacid)n.脱氧核糖核酸,是染色体的主要化学成分,DNA是一种分子,包含遗传指令,以引导生物发育与生命机能运作,喻为“蓝图”或“食谱”。带有遗传信息的DNA片段称为基因,其他的DNA序列,有些直接以自身构造发挥作用,有些则参与调控遗传信息的表现。3.genomen.基因组,染色体组。Vocabulary4.pristineadj.原始状态的,未受损的;新鲜而纯净的,清新的;原始的,远古的。5.pri
39、stinedoublehelix原始双螺旋。6.profilen.剖面,侧面,外形,轮廓。7.chimpanzeen.黑猩猩。8.rhesusmacaquen.恒河猴。9.seaurchinn.海胆。10.finchn.雀科小鸟。11.pinpointn.极小的范围;光点vt.准确地找出或描述adj.非常精确的。12.assayn.化验,试金;分析,试验,化验报告,试样,试料vt.化验,试验,检验,分析评价。13.tumorn.瘤。14.vaccineadj.牛痘的;种痘的;疫苗的。15.therapeuticsn.治疗学,疗法。16.dissectv.把解
40、剖(动植物等),切开仔细研究,把切成碎片。17.spectrometryn.物光谱测定法,度谱术。18.proteomics指蛋白质的大规模研究,特别是结构和功能,这个1997年出现的新词是“protein”和“genome”的结合。19.resonancen.共鸣,回声,反响,中介,谐振,共振,共振子,极短命的不稳定基本粒子。1Hereinthemessy,wet,analogworldofbiologywasdigitalinformation:alinearstringoffourchemicalgroupsencodingtheentireb
41、lueprintsfortheproteinmachineryofthelivingcell.Howcouldyounotbeinterestedincrackingthatcode在这个杂乱、潮湿和模拟的生物世界里,竟然有数字信息四个化学群组的线性串,用于对生命细胞的蛋白机制的蓝图进行编码。你怎么能对破解这些编码不感兴趣呢?ImportantSentences2Thisisaremarkableabstraction.Mostoftheprocessesinvolvinggenesthatweknowabouthavebe
42、endiscoveredusingthisgrosslysimplifiedrepresentationofreality.Itistheperfectrepresentationforcomputeranalysis,andwithoutitwecouldneverhaveapproachedaprojectonthescaleofthehumangenome.这是一个非常好的抽象。大多数已经发现的涉及到基因的处理都是利用这个简化的表示进行的。它为计算机分析提供了完美的表示,而如果没有它,我们永远不会开展人类基因组这样大规模的计划
43、。3Wehavealotofdata.TheintroductionofautomatedDNAsequencingintheearly1990screatedwhatwas,atthetime,atorrentofsequencedata.ButitwastheHumanGenomeProject,withitsmassiveautomation,productionlines,andmoney,thatreallyopenedthefloodgatesinthepastfewyears.我们有大量的数据
45、)Accordingtothetext,whichsentencefollowsiswrong()A.Bioinformaticsisacrossdisciplineofmolecularbiologyandcomputerscience.B.Bioinformaticsisacrossdisciplineofmolecularbiologyandsoftware.C.Bioinformaticsisafascinatingareaforsoftwaredeveloper.D.Bioinformaticsisachall
46、engingareaforsoftwaredeveloper.QuestionsandAnswers(2)Thethreeconceptsemergedthatremaincentraltobioinformaticstodayexclude().A.datarepresentationB.theconceptsimilarityC.bioinformaticsisnotatheoreticalscienceD.algorithmsforcomparingsequences(3)Thereasonsforalgorithmsofc
47、omparingsequencesandfindingsimilarregionsareattheheartofbioinformaticsdonotinclude().A.algorithmsareusedtofindgenesB.algorithmsareusedtostudyregulationofgenesC.algorithmsareusedtoassessevolutionofentiregenomesovertimeD.algorithmsareusedtodetermineentiregenom
48、es(4)AccordingtothesectionofSoftwareinBioinformatics,whichstatementfollowsiswrong()A.Twomainfactors,whichhaveshapedthecurrentlandscapeofbioinformaticssoftware,arethefieldhasbeendrivenbythemassiveamountofdataandtheresearchprojectsthatgenerateit.B.Alotofpeopl
49、ehavewrittentheidealprogramforgeneprediction.C.Thelifespanofasoftwareprojectisoftenquiteshortandtheleadtimebeforedeploymentisminimal.D.TheBLASTteambuiltaveryfastsequence-comparisonenginethatcouldsearchtheentirecontentsofGenBankinseconds.1.WhatisBioinformatics,whatistheobjectandresearchmethodsofit2.DoyouthinkthatyouwilldosomeresearchworksonBioinformaticssometimelaterProblems