1LocalInformation:Thiscategoryencompassesinformationwithinthecurrentworkingcontext,includingdefinedfunctionsignaturesandvariables.Italsocoverstherelativefilepathsofthemodulewithintheproject,suchasthemodule’sFullyQualifiedNames(FQNs),likeunstructured.documents.html.Localinformationiscrucialtoensuringthegeneratedcodeutilizesaccuratevariablesandoperatesharmoniouslywiththeotherfunctionsintheproject.
3Third-Party-LibraryInformation:Thiscategoryincludesthird-partylibrariespre-installedinthecoderepository’senvironment.Theselibrariescanbeaccesseddirectlythroughimportstatements,enhancingfunctionalitywithoutextradependencies.
Ourexperimentsaredrivenbytwoprimaryobjectives:1)Toinvestigatetheeffectivenessofthreedistincttypesofcoderepositoryinformationinassistingthemodelwithcodegenerationforspecificcoderepositories.2)Toassesswhetherourframeworkcanefficientlyextractandutilizethesethreetypesofinformation.Wecrawlandfilter29representativePythoncoderepositoriesfromPyPI,assembling13,784functions.Toensureadiversebenchmarkdataset,weconductasamplingprocess,selectingrepresentativefunctionsfromeachrepositorytototal383functions.WesubsequentlylabelthesebenchmarksandthemodeloutputsgeneratedunderdifferentconfigurationstoevaluatetheperformanceofA3superscript3A^{3}italic_Astart_POSTSUPERSCRIPT3end_POSTSUPERSCRIPT-CodGen.
Themaincontributionsofthispaperareasfollows.
Furthermore,sincethemodelisunawareoftheexistenceofthe“checktextforbulletcharacters”functionintheglobalmodule,itdefinesanewfunctionis_bullet_text(highlightedinblue)inthegeneratedcode.Alargepartofthiscodeduplicatesthefunctionis_bullet_textintheglobalmoduleunstructured/partition/text_type.py,resultingincoderedundancy.Italsobringsadditionalconcernsaboutthecorrectnessofthenewlygeneratedfunction.
Theseknowledgegapshighlightthelimitationsofexistingcodegenerationapproaches,highlightingtheneedformorecontext-awaremethodsthatcaneffectivelyinteractwiththecoderepositoryenvironment.Insummary,alackofknowledgeaboutlocalandglobalmodulescanresultinmodelsgeneratingcodelackingfunctionalconstraintsorcontainingredundancies.Alackofknowledgeaboutthird-partylibrariesinthecoderepositorycanleadtomodelsusingarbitrarythird-partylibraries,causingproblemswiththegeneratedcodenotworking.Asaresult,generatingcodeinspecificmodulesofthecoderepositoryismorethanjustmeetingfunctionalrequirements.Togeneratecodesimilartohumandevelopers,theapproachmustfullyunderstandandeffectivelyutilizethisdiversecontextualknowledgetoensurethatnewlyintroducedcodeseamlesslycoordinateswiththeexistingcoderepository.
Aboveall,inordertoenableLLMstofullyutilizeinformationspanningoverthecoderepositoryenvironmentforcodegeneration,itisessentialtoprovidethemodelwiththreetypesofknowledge:localmodules,globalmodules,andthird-partylibraryinformation.
Inthenextsection,wecomprehensivelydescribehowtoextract,structure,andutilizelocalaware,globalaware,andthird-partylibraryawareinformationfromcoderepositories.
Foracoderepository,wecaneffortlesslyobtainallcodefilesintheentirerepositorybytraversingallthedirectorieswithintheproject.Then,foreachfile,weextractallincludedfunctions,gatheringbothbasiccodeinformationandsemanticcodeknowledge.Thisprocesssnowballstoextractallessentialrepositoryinformation.
3)VectorRepresentations.Intheprevioustwosteps,weunderlinedexplictlyselectinformation.Toensureefficientretrievalofglobalfunctionslateron,havinganeffectivecoderepresentationthatfaithfullycapturesfunctionsemanticsisessential.Hence,weemployadual-prongedapproach,conductingseparateembeddingsforboththesourcecodeandthefunctionsummaries.Thisresultsinthecreationofcodevectorandsummaryvector,respectively.
Withintheuser’sworkingcontext,referredtoasthelocalmodule,fourtypesoflocal-moduleinformationcanbeencountered:
Inadditiontousinglocalmoduleknowledge,globalmoduleknowledgealsoplaysasignificantroleincodegeneration.Insteadoflettingthemodelimplementcomplicatedlogicstepsfromscratch,weexpectthemodelcandirectlycallfunctionsiftheyarealreadypresentinothercodefiles.Tohelpmodelsrealizethatglobalfunctionscanbereused,weretrieveglobalfunctionsinthecoderepositoryandorganizetheminastructuredway.Theglobalmoduleknowledgeminingprocessisdividedintothreephases:what-ifcodegeneration,globalfunctionretrieval,andinformationunification.
Inthissection,wefirstoutlinetheselectionoffoundationmodels,thedataset,andevaluationmetricsforassessingourframework.WethenintroducefiveResearchQuestions(RQs):
Giventhecommentsandthedefinitionofafunction,thecorrespondingsourcecodeforthatfunctionservesasthestandardanswertobegenerated.Simultaneously,weextractcallinformationforthefunctiontodeterminetheextentofcodereusewithinit,becauseahigherextentoffunctionreusereflectstheframeworklearnsbetterfromthreeawareness.
Tocomprehensivelyevaluatewhetherthethreeawarenessareeffective,weemploythefollowingmetrics:
Tovalidatethecallrelationshipsbetweentheextractedfunctions,werecruittwodeveloperswithoverfiveyearsofPythonprogrammingexperience.Eachfunctioninthebenchmarkdatasetisindependentlylabeledbybothdevelopers.Theyevaluatewhetherthefunctionutilizeslocalfunctionswithinthecurrentcodefile,assigningascoreof1ifitdoesand0ifitdoesn’t.Similarly,theyassesswhetherthefunctionutilizesglobalfunctionsandthird-partylibraries,assigningscoresof1or0accordingly.Thesescorescontributetocalculatingreuseawareness.Forfunctionsinvolvingreuse,theyidentifyallreusedfunctions/librariestocalculatereusecorrectness.
LocalinformationaidsLLMsincodegenerationbyofferingawarenessoflocalcontextualdetails.WeextractandorganizefourtypesofLocal-AwareKnowledgefromthelocalmodule:localfunctions,classinstanceattributes,localmoduleFQNs,andmodulevariables.Theseknowledgetypesofferdiverseperspectivesonthecontextofthelocalmodule.Ourgoalistoassesswhetherlocal-awareknowledgegenuinelyenhancesLLMs’abilitytoreuselocalfunctionsincodegeneration.Additionally,weaimtoidentifytheoptimalconfiguration,determiningwhetherallfourknowledgecategoriesshouldbeavailabletothemodel’sawarenessoronlyspecificones.
Werunthesefivedifferentconfigurationsonthebenchmarktoobtaincodegenerationresults.
WhengivenknowledgeofLocalFunctions,themodelachievesaprecisionof0.667,recallof0.528,F1scoreof0.589,andaccuracyof0.796.ThisisbecauseLocalFunctionknowledgeallowsthemodeltodirectlyinvokethesefunctions.UponadditionallyprovidingknowledgeofClassInstanceAttributes,themodel’sprecision,recall,andF1scoreimproveby7.94%,19.70%,and14.26%,respectively,comparedtoprovidingonlyknowledgeofLocalFunctions.ThisimprovementisattributedtotheprovisionofClassInstanceAttributes,whichenablesthemodeltobetterunderstandhowlocalfunctionsinteractwithclassvariablesintheproject,leadingtomoreaccurateapplicationoflocalfunctions.
However,whenweintroducetheModuleVariables,themodel’sperformancedrasticallydeclines,particularlywitha11.87%dropinprecision.ThismaybeduetotheincreasedinformationloadresultingfromknowledgeofModuleVariables,whichimpactsperformance.Nevertheless,themodel’sperformanceisrestoredandapproachesthecapabilitiesofprovidingknowledgeofLocalFunctionsandClassInstanceAttributeswhenwesubsequentlyaddknowledgeofLocalModuleFQN.Thissuggeststheexistenceofsubtleinteractionswithintheknowledgeconfiguration.Whenthemodelreceivesonlyoneortwokeyknowledgetypes,itsperformancemaybesuperiortowhenitreceivesmoreinformation.However,asthevolumeofknowledgeincreasestoacertainextent,themodel’scapabilitiesstrengthen,resultinginimprovedperformance.
WeultimatelychoosetheLocal-AwareKnowledgesetup,includingknowledgeofLocalFunctionsandClassInstanceAttributes.Thisconfigurationservesasdefaultfortheupcomingexperiments.
TofurtherevaluatetheeffectivenessofourWhat-ifCodeGeneratorUnit,weconductanexperimentusingthebenchmark,focusingontestcasesthatcallglobalfunctions(47casesintotal).Wecomparetworetrievalmethods:
Tofurtherassesstheimpactofthenumberofglobalfunctionspresentedtothemodel,wecomputetheaveragenumberofretrievedfunctions,denotedasAvg.RetrievedFunctions.Wefindthatprovidingthemodelwitharound8functionsyieldsthebestperformanceintermsofcodereuse,strikingasubtlebalance.Thus,weselectk=5asoptimalconfigurationforsubsequentexperiments.
TofurtherevaluatetheeffectivenessofourWhat-ifCodeGeneratorUnit,weconductanexperimentusingthebenchmark.TheresultsshowedthattheFunDesmethodretrieved41.5%ofglobalfunctions,whiletheFunDes+FunCodemethodretrieved60.6%,representinganalmost20%increase.ThisdemonstratesthatourWhat-ifCodeGeneratorUnitenhancestheGlobalFunctionRetrievalUnitbyprovidingintermediatecodeforcodesimilaritymetrics,thusretrievingmorerelevantglobalfunctions.
Tovalidatethispoint,wecomparecodegenerationbetweenwithandwithoutthird-party-libraryawareinformation.Onlylocal-awareandglobal-awareinformationisfedintothemodel.Then,werunthesetwodifferentconfigurationsonthebenchmarktoobtaincodegenerationresults.Notably,weintroducetheLibraryCoverage,thepercentageofthird-partylibrariesusedinthegeneratedcodeinthethird-part-librarybase,tofurtherelaboratetheextentofreusing.
Finding3:Third-Party-LibraryAwareinformationsignificantlyimprovesthemodel’sabilitytoutilizeavailablethird-partylibraries,reducingriskofpotentialcompatibilityissue.V-DRQ4:WhatistheoverallperformanceofourA3superscript3A^{3}italic_Astart_POSTSUPERSCRIPT3end_POSTSUPERSCRIPTCodGenframeworkV-D1MotivationInprevioussections,wehaveindividuallyverifiedtheeffectivenessoflocal-aware,global-aware,andthird-party-library-awareinformation.Wealsoexploredtheoptimalconfigurations.Inthissection,weaimtointegrallyvalidatetheeffectivenessofourA3superscript3A^{3}italic_Astart_POSTSUPERSCRIPT3end_POSTSUPERSCRIPT-CodGenframeworkfromvariousperspectives.Therefore,weconductanablationstudytoevaluateA3superscript3A^{3}italic_Astart_POSTSUPERSCRIPT3end_POSTSUPERSCRIPT-CodGenintermsoflocalreuse,globalreuse,andthird-partyreuse.Additionally,asmorepowerfulmodelsarereleased,weaimtocomparetheperformanceofA3superscript3A^{3}italic_Astart_POSTSUPERSCRIPT3end_POSTSUPERSCRIPT-CodGenacrossdifferentfoundationmodels.
Intermsofcorrectness,weobservethatLocalReuseCorrectnessisoptimalforA1superscript1A^{1}italic_Astart_POSTSUPERSCRIPT1end_POSTSUPERSCRIPTGeneration(Local-Aware),GlobalReuseCorrectnessisbestforA2superscript2A^{2}italic_Astart_POSTSUPERSCRIPT2end_POSTSUPERSCRIPTGeneration(Local-AwareandGlobal-Aware),andThird-PartyLibraryReuseCorrectnessishighestforA3superscript3A^{3}italic_Astart_POSTSUPERSCRIPT3end_POSTSUPERSCRIPT-CodGen.Thissuggeststhatincreasinginputknowledgemayinfluencethemodel’sperformanceinspecificreusescenarios.Nevertheless,A3superscript3A^{3}italic_Astart_POSTSUPERSCRIPT3end_POSTSUPERSCRIPT-CodGenexhibitsonlyaslightdifferencefromthebaselineinlocalreuseandglobalreusecorrectness(0.003-0.015).Despitethisminordifferenceincorrectnessmetricsanditssuperiorperformanceinotheraspects,A3superscript3A^{3}italic_Astart_POSTSUPERSCRIPT3end_POSTSUPERSCRIPT-CodGendemonstratesrobustoverallperformanceinlocalreuse,globalreuse,andthird-partylibraryreuse.
LookingattheLibraryCoverage,A3superscript3A^{3}italic_Astart_POSTSUPERSCRIPT3end_POSTSUPERSCRIPT-CodGenachievesnear-perfectperformancewithascoreof0.940,signifyingitsabilitytoeffectivelyreusepre-installedlibrariesandavoidcompatibilityissuessuchas“ModuleNotFoundError:Nomodulenamedxxx.”Furthermore,theframeworkconsistentlygeneratescodewithanoptimalnumberoflines(LOC),indicatingitscapabilitytoproduceconciseandefficientcode.
However,forglobalreuseawarenessandcorrectness,CoPilot’sperformanceissuboptimal,withF1scoresof0.447and0.136,respectively.CoPilot’sworkflow,althoughcapableofutilizingneighboringtabs’context,struggleswithglobalfunctioncallsasnecessaryglobalfunctionsmightexistinunopenedtabs.Openingallrepositorytabsisimpractical,andCoPilotdoesnotstructureinformationforcompletion,leadingtochaoticcontext.Incontrast,A3superscript3A^{3}italic_Astart_POSTSUPERSCRIPT3end_POSTSUPERSCRIPT-CodGenretrievesrelevantglobalfunctionsandstructuresthem,facilitatingeasierandmoreaccuratereuse.Forthird-partylibraryreuse,A3superscript3A^{3}italic_Astart_POSTSUPERSCRIPT3end_POSTSUPERSCRIPT-CodGenoutperformsCoPilotinawareness,correctness,andcoverage.A3superscript3A^{3}italic_Astart_POSTSUPERSCRIPT3end_POSTSUPERSCRIPT-CodGenhelpsthemodeltobeawareofallpre-installedlibraries,whereasGitHubCoPilotrandomlyusesthird-partylibraries.WhileCoPilotexcelsinlocalinformationreuse,itlacksinglobalfunctionsandthird-partylibrariesreuse.
OurmethodhaspotentialtoenhanceCopilotbyusingknowledgeretrievaltofetchrelevantglobalfunctionsandlibraryinformationfromtherepository,enablingrepository-awarecodegeneration.Forinstance,Copilot’slimitationscanbemitigatedbyincorporatinginterpreterinformationfromthecompilertohelpCopilotbecomeawareofthird-partylibraries.Additionally,usingglobalfunctionretrievalmethodscanassistCopilotinreusingfunctionsfromothercodefiles,notjustfromneighboringtabs.StructuringtheinformationbeforepassingittoCopilot,ratherthanprovidingallinformationtogether,mayalsoimproveperformance.
Inthefuture,wewillmaintainandupdatethisdataset,providingthecommunitywithavaluableresourceforevaluatingrepository-awarecodegeneration.
Thethreatstovalidityofthispaperliesinthefollowingaspects.
Codegenerationtasksisaboutcreatingsourcecodefromprovidednaturallanguagedescriptionsorrequirements,whichhasbeenalongstandingfocalpointinSEresearch.