Inthiswork,weintroduceActiveRepresentationLearning,aclassofproblemsthatintertwinesexplorationandrepresentationlearningwithinpartiallyobservableenvironments.WeextendideasfromActiveSimultaneousLocalizationandMapping(activeSLAM),andtranslatethemtoscientificdiscoveryproblems,exemplifiedbyadaptivemicroscopy.Weexploretheneedforaframeworkthatderivesexplorationskillsfromrepresentations,aimingtoenhancetheefficiencyandeffectivenessofdatacollectionandmodelbuildinginthenaturalsciences.
Representationlearningisfrequentlyusedtouncoverthelatentstructureinadatasetandliesatthecoreofmoderndatascienceandmachinelearning.Deeplearning-drivenrepresentationlearningapproacheshavebeenwidelyappliedincaseswheredataisabundantbutchallengingtointerpret,forexampleinscientificdataanalysis.Frequently,suchdatasetscannotbelabeledbyhumanexperts,becausethenatureofthescientificquestionsdoesnotallowforclearlabeling.Forexample,recentmethodsintheanalysisoffunctionalneuroimagingdatarelyonself-supervisedlearningtechniquestouncoverthegeometricpropertiesofthedataset,oftenusingmethodslikeauto-encoding[Pandarinath2018-dp],orcontrastivelearning[Schneider2023-rn].Whiletheseapproachescanleadtoremarkablyinsightfulrepresentationsofcomplexdatastructures,theyaremostlyappliedtoscenarioswherelarge,well-structureddatasetsarereadilyavailablefortraininggeneral-purposedeeplearningmodels.
However,insomedatascienceproblems,datahastobeacquiredbyexploringacomplexphysicalsystemwhilstinteractingwithitthroughmeasurements.Consider,forexample,activesensing-typeproblems,wheredataiscollectedthroughprobingtheinput-outputbehaviorofablackboxprocess,includingmedicaldiagnosis,exploratorydataanalysis,activesimultaneouslocalizationandmapping(activeSLAM),andadaptivelight-sheetmicroscopy.Especiallyinlight-sheetmicroscopy,wemayfindourselveswiththepotentialtocollect,throughacostlyprocess,vastamountsofdataofwhichonlyasmallsubsetistrulyinformativeaboutthequestionsandtasksathand.
Knowinghowandwheretocollectnewdataischallenging,butitcanmakerepresentationlearningmoreeffective.Inturn,havingarepresentationoftheunderlyingdynamicalsystemandthemeasurementmechanism,potentiallyhelpsguidingdataacquisition.However,suchanactiveapproachtorepresentationlearningrequiresintegratingaspectsofsequentialdecisionmakingwithrepresentationlearningconcepts.Thisraisesthequestion:Howcanweintegraterepresentationlearningandsequentialdecisionmaking
ActiveRepresentationLearningismotivatedbytheconceptoftheactivemicroscope.Indevelopmentalbiology,thegoalisoftentouncoverthe3Dstructureanddynamicalprocessesofabiologicalspecimens’developmentaltrajectory.Tothisend,onetriestoacquireastackofimagesinthefastestandleastinvasivewaypossible.Itmightbetemptingtoblindlyacquireasmanyplanesaspossible,however,imagingabiologicalprocess(e.g.adevelopingorganism)isinherentlysubjectedtothetrade-offinvolvingspatialandtemporalresolution,signal-to-noiseratioandsamplehealth.Additionally,absorption,scattering,andrefractionleadtoincompleteandnoisy2Dimageplanes.
Modernlight-sheetmicroscopescanquicklygenerateenormousamountsofdataintherangeofmultipleterabytes,thuspredictingwhichmeasurementsareactuallynecessarywouldconsiderablysimplifythedataprocessingpipelineorevenmakesitfeasibleinthefirstplace.Becauseweareseekingaconciserepresentationofthebiologicalprocess(seee.g.[Kobayashi2022-jh]),therawdataisjustameanstoanend.Anactivemicroscopewouldbeabletobalancetheimagingtrade-offs,andlearnausefulrepresentationofthebiologicalprocesswhilebeingasgentlewiththespecimenaspossible[Scherf2015-ji].Thisalsoinvolveslearningarepresentationoftheimagingprocesstoinformthedecisionabouthow,whereandwhentolooknextfornewdata.Thiscanbeachievedbyseparatingthedynamicalfactorsthataredirectlycontrollable(whereandhowtolook),andthosethatarenot(thebiologicalprocessitself),learningtocontrolwhatcanbecontrolledwithminimalcost,andmotivatingtheagenttocuriouslyexplorewhatcannot.
Duetothecumulativenatureofthecostsincurred,andthetemporaldependenciesbetweenthecontinualformationofrepresentationsandtheagent’sdecisionsabouthowtooperatethemicroscope,thisisaproblemattheintersectionofsequentialdecisionmakingandcontinualrepresentationlearning.
Duringdataacquisition,thepotentiallyincompleterepresentationmustbeusedasabasisforinformingthenextactiontotake,similartohowtheincompletenessofthemapinactiveSLAMisusedtoinformwheretogonext.ThiscirculardependencyofrepresentationanddecisionmakingisthebasisofARLanditsframingasageneralizationofactiveSLAM.Ingeneral,ARLisdifficulttoframeasabandit,sincetheagent-environmentsystemhasacomplexstate.
AsetuplikethisisnotonlyreminiscentofactiveSLAM,butalsoofreinforcementlearning-basedvisualattention[minut2001].WecaneasilythinkofthemicroscopesetupasaroboticenvironmentinthecontextofRL.Thecontrollablepartsofthemicroscopeactlikeabody,whereastheimagedvolumeistheoutsideenvironment.Ouragenthastolearnhowtoinfluenceitsobservations,andhowtodosoinordertoperceivethemostinterestingaspectsoftheenvironment’suncontrollableparts.
ActiveRepresentationLearninghighlightstheneedforaframeworkcapableofcapturingmodelbuildingthroughsequential,incompleteobservations,andtheneedtolearntheunderlyingstructureofalatentobjectorphenomenon.Ideally,theintelligentagentcanlearnstructuralsimilaritiesacrossalargerclassofsimilarproblemsandusethisinformationthenexttimeasimilarlystructuredproblemisencountered,e.g.whenasimilarspecimenisexaminedunderthemicroscope.WebelievethattheoverarchingthemesofActiveRepresentationLearningtranscendapplicationsinroboticsandmicroscopy.Alargeinterdisciplinarycommunitywouldbenefitfromaformaltreatmentofintelligentmodelbuildingsystems.Acommonframeworkwouldallowscientistsfromdifferentfieldstocometogetherandsharetheirexperienceswithrelatedproblems.Ideally,thecommunitywouldworktowardsandeventuallyachieveavirtualscientistframeworkthatallowsustobuildagentsthatcancuriouslyexplorecomplexprocessesthewayhumansdo,potentiallydiscoveringandexploitingitssymmetries.
ActiveRepresentationLearning(ARL)presentsastepforwardintheintersectionofdecisionmakingandrepresentationlearningwithinpartiallyobservableenvironments.ByleveragingtheconceptsandtechniquesfromactiveSLAM,wehaveillustratedhowtheseideascanbeextendedtoalargerclassofproblemswithasimilarstructure,exemplifiedbywhatwecallactivemicroscopy.TheproposedARLframeworkemphasizestheimportanceofdisentanglingcontrollableanduncontrollablefactorsintheenvironment,thusenablingintelligentagentstomakeinformeddecisionsaboutwhereandhowtoexplore.Wearguethatthisapproachnotonlyimprovesthequalityofthedatacollectedbutalsofacilitatesthecreationofmorerobustandinterpretablemodelsofcomplexsystems.FutureresearchshouldfocusonrefiningthetheoreticalfoundationsofARL,addressingpracticalimplementationissues,andexploringnewapplicationsacrossvariousdomains.Ultimately,ARLholdsthepromiseoftransforminghowweinteractwithandunderstandcomplex,dynamicsystems,pavingthewayforbreakthroughsinbothartificialintelligenceandscientificdiscovery.
TheauthorswouldliketothankSimonHirlnderforinsightfulcommentsonthepracticalissuesinARL,andtheanonymousreviewerforvaluablefeedbackontheconnectionsofARLwithexistingmethods.N.M.andN.S.aresupportedbyBMBF(FederalMinistryofEducationandResearch)throughACONITE(01IS22065)andtheCenterforScalableDataAnalyticsandArtificialIntelligence(ScaDS.AI.)Leipzig.N.M.isalsosupportedbytheMaxPlanckIMPRSCoNIDoctoralProgram.J.H.issupportedbytheAlexandervonHumboldtFoundation(AlexandervonHumboldtProfessorship;J.H.)andtheGermanResearchFoundation(Germany’sExcellenceStrategyEXC2067/1-390729940;J.H.).G.M.issupportedbytheMWK(NiederschsischesMinisteriumfürWissenschaftundKultur,6707040)andismemberoftheHerthaSponerCollegeoftheClusterofExcellenceMultiscaleBioimaging(MBExC;EXC2067/1-390729940),UniversityofGoettingen,Germany.Figure1wascreatedusingBioRender.com.
APartiallyObservableMarkovDecisionProcess(POMDP;[Astrom1965-ym])isdefinedbythetuple(,,,T,O,R)(\mathcal{S},\mathcal{A},\mathcal{O},T,O,R)(caligraphic_S,caligraphic_A,caligraphic_O,italic_T,italic_O,italic_R),where\mathcal{S}caligraphic_S,\mathcal{A}caligraphic_A,and\mathcal{O}caligraphic_Oarethestate,action,andobservationspaces,andT:×→Δ():→ΔT:\mathcal{S}\times\mathcal{A}\to\Delta(\mathcal{S})italic_T:caligraphic_S×caligraphic_A→roman_Δ(caligraphic_S),O:×→Δ():→ΔO:\mathcal{S}\times\mathcal{A}\to\Delta(\mathcal{O})italic_O:caligraphic_S×caligraphic_A→roman_Δ(caligraphic_O),andR:×→:→R:\mathcal{S}\times\mathcal{A}\to\mathbb{R}italic_R:caligraphic_S×caligraphic_A→blackboard_Rarethetransition,observationandrewardfunctionrespectively,seee.g.[littman_thesis].
Thegoalistodesignapolicyπ:→:→\pi:\mathcal{H}\to\mathcal{A}italic_π:caligraphic_H→caligraphic_Awhere\mathcal{H}caligraphic_Histhesetofhistoriesofobservationsandactions,i.e.π(at|ot,at1,ot1,…)conditionalsubscriptsubscriptsubscript1subscript1…\pi(a_{t}|o_{t},a_{t-1},o_{t-1},...)italic_π(italic_astart_POSTSUBSCRIPTitalic_tend_POSTSUBSCRIPT|italic_ostart_POSTSUBSCRIPTitalic_tend_POSTSUBSCRIPT,italic_astart_POSTSUBSCRIPTitalic_t-1end_POSTSUBSCRIPT,italic_ostart_POSTSUBSCRIPTitalic_t-1end_POSTSUBSCRIPT,…),tomaximizetheexpectedreward:
however,thisisusuallyintractabletocompute,ande.g.variationalapproximationsareusedinpractice.
SincetheBayesianbeliefstateservesasasufficientstatisticofthehistory,optimalpoliciesfortheoriginalPOMDPcanbedeterminedbysolvinganequivalentcontinuous-spaceMDP,knownasabeliefMDP(,,Tb,rb)subscriptsubscript(\mathcal{B},\mathcal{A},T_{b},r_{b})(caligraphic_B,caligraphic_A,italic_Tstart_POSTSUBSCRIPTitalic_bend_POSTSUBSCRIPT,italic_rstart_POSTSUBSCRIPTitalic_bend_POSTSUBSCRIPT),wherethenewtransitionTbsubscriptT_{b}italic_Tstart_POSTSUBSCRIPTitalic_bend_POSTSUBSCRIPTandrewardfunctionsrbsubscriptr_{b}italic_rstart_POSTSUBSCRIPTitalic_bend_POSTSUBSCRIPTaredefinedover×\mathcal{B}\times\mathcal{A}caligraphic_B×caligraphic_A.UsingbeliefMDPs,severaltheoreticalresultsaboutMDPscanbeextendedtoPOMDPs,suchastheexistenceofoptimaldeterministicpolicies.
InthiscontinuousMDP,thegoalistomaximizethecumulativerewardbyidentifyingapolicythatusesthecurrentbeliefstateasinput.Formally,weseekapolicyπsuperscript\pi^{*}italic_πstart_POSTSUPERSCRIPTend_POSTSUPERSCRIPTthatsatisfies
whererb(bt,at)=∫s∈b(st)r(st,at)subscriptsubscriptsubscriptsubscriptsubscriptsubscriptsubscriptr_{b}(b_{t},a_{t})=\int_{s\in\mathcal{S}}b(s_{t})r(s_{t},a_{t})italic_rstart_POSTSUBSCRIPTitalic_bend_POSTSUBSCRIPT(italic_bstart_POSTSUBSCRIPTitalic_tend_POSTSUBSCRIPT,italic_astart_POSTSUBSCRIPTitalic_tend_POSTSUBSCRIPT)=∫start_POSTSUBSCRIPTitalic_s∈caligraphic_Send_POSTSUBSCRIPTitalic_b(italic_sstart_POSTSUBSCRIPTitalic_tend_POSTSUBSCRIPT)italic_r(italic_sstart_POSTSUBSCRIPTitalic_tend_POSTSUBSCRIPT,italic_astart_POSTSUBSCRIPTitalic_tend_POSTSUBSCRIPT).
WhilePOMDPsareverygeneral,theycanbedifficulttomapontoactivesensing-typeproblems.ActiveSLAMintroducesfurtherspecificationstothePOMDPformalismintheformofbelief-dependentrewards[Araya-Lopez_undated-qy],andastructuredstate-space.Thestateattimettitalic_tisusuallyassumedtobeacombinationoftherobot’sposext∈subscriptx_{t}\in\mathcal{X}italic_xstart_POSTSUBSCRIPTitalic_tend_POSTSUBSCRIPT∈caligraphic_Xandamapmt∈subscriptm_{t}\in\mathcal{M}italic_mstart_POSTSUBSCRIPTitalic_tend_POSTSUBSCRIPT∈caligraphic_MthattherobotbuildsupovertimewhileinteractingwiththePOMDP,resultinginafactoredstatespaceoftheform=×\mathcal{S}=\mathcal{X}\times\mathcal{M}caligraphic_S=caligraphic_X×caligraphic_M.