BundleURIsarelocationswhereGitcandownloadoneormorebundlesinordertobootstraptheobjectdatabaseinadvanceoffetchingtheremainingobjectsfromaremote.
Onegoalistospeedupclonesandfetchesforuserswithpoornetworkconnectivitytotheoriginserver.Anotherbenefitistoallowheavyusers,suchasCIbuildfarms,touselocalresourcesforthemajorityofGitdataandtherebyreducingtheloadontheoriginserver.
ToenablethebundleURIfeature,userscanspecifyabundleURIusingcommand-lineoptionsortheoriginservercanadvertiseoneormoreURIsviaaprotocolv2capability.
ThebundleURIstandardaimstobeflexibleenoughtosatisfymultipleworkloads.ThebundleproviderandtheGitclienthaveseveralchoicesinhowtheycreateandconsumebundleURIs.
EachrepositoryisdifferentandeveryGitserverhasdifferentneeds.HopefullythebundleURIfeatureisflexibleenoughtosatisfyallneeds.Ifnot,thenthefeaturecanbeextendedthroughitsversioningmechanism.
Toprovideaserver-sideimplementationofbundleservers,nootherpartsoftheGitprotocolarerequired.ThisallowsservermaintainerstousestaticcontentsolutionssuchasCDNsinordertoservethebundlefiles.
AtthecurrentscopeofthebundleURIfeature,allURIsareexpectedtobeHTTP(S)URLswherecontentisdownloadedtoalocalfileusingaGETrequesttothatURL.Theservercouldincludeauthenticationrequirementstothoserequestswiththeaimoftriggeringtheconfiguredcredentialhelperforsecureaccess.(Futureextensionscoulduse"file://"URIsorSSHURIs.)
Anyotherdataprovidedbytheserverisconsiderederroneous.
TheGitservercanadvertisebundleURIsusingasetofkey=valuepairs.AbundleURIcanalsoserveaplain-textfileintheGitconfigformatcontainingthesesamekey=valuepairs.Inbothcases,weconsiderthistobeabundlelist.Thepairsspecifyinformationaboutthebundlesthattheclientcanusetomakedecisionsforwhichbundlestodownloadandwhichtoignore.
Afewkeysfocusonpropertiesofthelistitself.
(Required)Thisvalueprovidesaversionnumberforthebundlelist.IfafutureGitchangeenablesafeaturethatneedstheGitclienttoreacttoanewkeyinthebundlelistfile,thenthisversionwillincrement.Theonlycurrentversionnumberis1,andifanyothervalueisspecifiedthenGitwillfailtousethisfile.
Ifthisstring-valuedkeyexists,thenthebundlelistisdesignedtoworkwellwithincrementalgitfetchcommands.Theheuristicsignalsthatthereareadditionalkeysavailableforeachbundlethathelpdeterminewhichsubsetofbundlestheclientshoulddownload.TheonlyheuristiccurrentlyplannediscreationToken.
Theremainingkeysincludean
Thisstringvaluerepresentsanobjectfilterthatshouldalsoappearintheheaderofthisbundle.Theserverusesthisvaluetodifferentiatedifferentkindsofbundlesfromwhichtheclientcanchoosethosethatmatchtheirobjectfilters.
Thisvalueisanonnegative64-bitintegerusedforsortingthebundleslist.Thisisusedtodownloadasubsetofbundlesduringafetchwhenbundle.heuristic=creationToken.
Thisstringvalueadvertisesareal-worldlocationfromwherethebundleURIisserved.ThiscanbeusedtopresenttheuserwithanoptionforwhichbundleURItouseorsimplyasaninformativeindicatorofwhichbundleURIwasselectedbyGit.Thisisonlyvaluablewhenbundle.modeisany.
HereisanexamplebundlelistusingtheGitconfigformat:
IfauserknowsabundleURIfortherepositorytheyarecloning,thentheycanspecifythatURImanuallythroughacommand-lineoption.However,aGithostmaywanttoadvertisebundleURIsduringthecloneoperation,helpingusersunawareofthefeature.
TheonlythingrequiredforthisfeatureisthattheservercanadvertiseoneormorebundleURIs.Thisadvertisementtakestheformofanewprotocolv2capabilityspecificallyfordiscoveringbundleURIs.
TheclientcouldchooseanarbitrarybundleURIasanoptionorselecttheURIwithbestperformancebysomeexploratorychecks.ItisuptothebundleprovidertodecideifhavingmultipleURIsispreferabletoasingleURIthatisgeodistributedthroughserver-sideinfrastructure.
TheprimaryneedforbundleURIsistospeedupclones.TheGitclientwillinteractwithbundleURIsaccordingtothefollowingflow:
Notethatduringacloneweexpectthatallbundleswillberequired,andheuristicssuchasbundle.
IfagivenbundleURIisabundlelistwithabundle.heuristicvalue,thentheclientcanchoosetostorethatURIasitschosenbundleURI.TheclientcanthennavigatedirectlytothatURIduringlatergitfetchcalls.
WhendownloadingbundleURIs,theclientcanchoosetoinspecttheinitialcontentbeforecommittingtodownloadingtheentirecontent.ThismayprovideenoughinformationtodetermineiftheURIisabundlelistorabundle.Inthecaseofabundle,theclientmayinspectthebundleheadertodeterminethatalladvertisedtipsarealreadyintheclientrepositoryandcanceltheremainingdownload.
Whentheclientfetchesnewdata,itcandecidetofetchfrombundleserversbeforefetchingfromtheoriginremote.Thiscouldbedoneviaacommand-lineoption,butitismorelikelyusefultouseaconfigvaluesuchastheonespecifiedduringtheclone.
Thefetchoperationfollowsthesameproceduretodownloadbundlesfromabundlelist(althoughwedonotwanttouseparalleldownloadshere).WeexpectthattheprocesswillendwhenallprerequisitecommitOIDsinathinbundlearealreadyintheobjectdatabase.
WhenusingthecreationTokenheuristic,theclientcanavoiddownloadinganybundlesiftheircreationtokensarenotlargerthanthestoredcreationtoken.Afterfetchingnewbundles,Gitupdatesthislocalcreationtoken.
Ifthebundleproviderdoesnotprovideaheuristic,thentheclientshouldattempttoinspectthebundleheadersbeforedownloadingthefullbundledataincasethebundletipsalreadyexistintheclientrepository.
IftheGitclientdiscoverssomethingunexpectedwhiledownloadinginformationaccordingtoabundleURIorthebundlelistfoundatthatlocation,thenGitcanignorethatdataandcontinueasifitwasnotgivenabundleURI.TheremoteGitserveristheultimatesourceoftruth,notthebundleURI.
Hereareafewexampleerrorconditions:
Therearealsosituationsthatcouldbeseenaswasteful,butarenoterrorconditions:
ThebundleURIfeatureisintentionallydesignedtobeflexibletodifferentwaysabundleproviderwantstoorganizetheobjectdata.However,itcanbehelpfultohaveacompleteorganizationmodeldescribedheresoproviderscanstartfromthatbase.
ThisexampleorganizationisasimplifiedmodelofwhatisusedbytheGVFSCacheServers(seesectionneartheendofthisdocument)whichhavebeenbeneficialinspeedingupclonesandfetchesforverylargerepositories,althoughusingextrasoftwareoutsideofGit.
Thebundleserverrunsregularly-scheduledupdatesforthebundlelist,suchasonceaday.Duringthistask,theserverfetchesthelatestcontentsfromtheoriginserverandgeneratesabundlecontainingtheobjectsreachablefromthelatestoriginrefs,butnotcontainedinapreviously-computedbundle.Thisbundleisaddedtothelist,withcarethatthecreationTokenisstrictlygreaterthanthepreviousmaximumcreationToken.
Anexamplebundlelistisprovidedhere,althoughitonlyhastwodailybundlesandnotafulllistof30:
Theintentionofthisdataorganizationhastwomaingoals.First,initialclonesoftherepositorybecomefasterbydownloadingprecomputedobjectdatafromaclosersource.Second,gitfetchcommandscanbefaster,especiallyiftheclienthasnotfetchedforafewdays.However,ifaclientdoesnotfetchfor30days,thenthebundlelistorganizationwouldcauseredownloadingalargeamountofobjectdata.
Onewaytomakethisorganizationmoreusefultouserswhofetchfrequentlyistohavemorefrequentbundlecreation.Forexample,bundlescouldbecreatedeveryhour,andthenonceadaythose"hourly"bundlescouldbemergedintoa"daily"bundle.Thedailybundlesaremergedintotheoldestbundleafter30days.
Itisrecommendedthatthisbundlestrategyisrepeatedwiththeblob:nonefilterifclientsofthisrepositoryareexpectingtousebloblesspartialclones.Thislistofbloblessbundlesstaysinthesamelistasthefullbundles,butusesthebundle.
Thisdesigndocumentisbeingsubmittedonitsownasanaspirationaldocument,withthegoalofimplementingallofthementionedclientfeaturesoverthecourseofseveralpatchseries.Hereisapotentialoutlineforsubmittingthesefeatures:
Asthesefeaturesarereviewed,thisplanmightbeupdated.Wealsoexpectthatnewdesignswillbediscoveredandimplementedasthisfeaturematuresandbecomesusedinreal-worldscenarios.
TheGitprotocolalreadyhasacapabilitywheretheGitservercanlistasetofURLsalongwiththepackfileresponsewhenservingaclientrequest.Theclientisthenexpectedtodownloadthepackfilesatthoselocationsinordertohaveacompleteunderstandingoftheresponse.
ThismechanismisusedbytheGerritserver(implementedwithJGit)andhasbeeneffectiveatreducingCPUloadandimprovinguserperformanceforclones.
Amajordownsidetothismechanismisthattheoriginserverneedstoknowexactlywhatisinthosepackfiles,andthepackfilesneedtobeavailabletotheuserforsometimeaftertheserverhasresponded.Thiscouplingbetweentheoriginandthepackfiledataisdifficulttomanage.
Further,thisimplementationisextremelyhardtomakeworkwithfetches.
TheendpointthatVFSforGitisfamousforistheGET/gvfs/objects/{oid}endpoint,whichallowsdownloadinganobjecton-demand.Thisisacriticalpieceofthefilesystemvirtualizationofthatproduct.
However,amoresubtleneedistheGET/gvfs/prefetchlastPackTimestamp=
Thecacheservercomputesthese"prefetch"packfilesusingthefollowingstrategy:
Whenauserrunsgvfscloneorscalarcloneagainstarepowithcacheservers,theclientrequestsallprefetchpackfiles,whichisatmost24+30+1packfilesdownloadingonlycommitsandtrees.Theclientthenfollowswitharequesttotheoriginserverforthereferences,andattemptstocheckoutthattipreference.(Thereisanextraendpointthathelpsgetallreachabletreesfromagivencommit,incasethatcommitwasnotalreadyinaprefetchpackfile.)
Duringagitfetch,ahookrequeststheprefetchendpointusingthemost-recenttimestampfromapreviously-downloadedprefetchpackfile.Onlythelistofpackfileswithlatertimestampsaredownloaded.Mostusersfetchhourly,sotheygetatmostonehourlyprefetchpack.Userswhosemachineshavebeenofforotherwisehavenotfetchedinover30daysmightredownloadallprefetchpackfiles.Thisisrare.
Itisimportanttonotethattheclientsalwayscontacttheoriginserverfortherefsadvertisement,sotherefsarefrequently"ahead"oftheprefetchedpackdata.Themissingobjectsaredownloadedon-demandusingtheGETgvfs/objects/{oid}requests,whenneededbyacommandsuchasgitcheckoutorgitlog.SomeGitoptimizationsdisablechecksthatwouldcausetheseon-demanddownloadstobetooaggressive.