Weplantoopen-sourceTacSL’stactilesimulationmoduleandlearningtools.Ouraimistoencouragewidespreadresearchontactile-basedalgorithmdevelopmentforperception,control,andpolicylearning,aswellastofurtherincreasetheadoptionoftouchsensingacrossthemanipulationcommunity.
Tothebestofourknowledge,TacSListhefirstgeneral-purposesimulationmodulethatprovidesfastandefficientsimulationofbothtactileimageandtactileforce-fieldsensing.Itachievesthenecessaryspeedforonlinelearningofend-to-endtactile-image-basedpoliciesforprehensilecontact-richmanipulation,distinguishingitfrompreviousworks.TacSLalsoprovidethealgorithmictoolsenablingotherstoperformeffectivepolicylearningandsim-to-realtransfer.
TacSLsimulatesvisuotactilesensorsintwophases.First,itsimulatesthephysicalinteractionsbetweenthetactilesensorandindentingobjectsinafastandstablemanner.Basedonthesimulation,TacSLthenextractsandcomputestwotactilemeasurements:tactileRGBimagesandtactileforcefields.Notably,bothphasesleverageGPUparallelizationtoachievesubstantialperformanceimprovementscomparedtoexistingstate-of-the-artapproaches.WenowdescribethesesimulationcomponentsofTacSLindetail.
ThissectiondescribeshowdynamiccontacteffectsarehandledinTacSL.We1)outlineourgeneraldynamicssolverprocedure,2)explainthegoverninganalyticalequationsofoursoftcontactconstraints,and3)addresshowtheseequationsaresolvedinanumerically-stablemanner.WeimplementthenecessarycontactsolvercodeinsidethephysicsengineandconfigurethecollisiongeometryandcontactparametersfromtheTacSLside.
DynamicsSolver:Ourphysicssolveroperatesonadiscrete-timeformulationwithadiscretizationintervalΔtΔ\Deltatroman_Δitalic_t.Giventhepositionppitalic_pandvelocityvvitalic_vofabodyatthecurrenttime-step,thesolvercomputesthepositionp+superscriptp^{+}italic_pstart_POSTSUPERSCRIPT+end_POSTSUPERSCRIPTandvelocityv+superscriptv^{+}italic_vstart_POSTSUPERSCRIPT+end_POSTSUPERSCRIPTatthenexttimestepusingasemi-implicitEulerintegrationscheme:
\displaystylev^{+}italic_vstart_POSTSUPERSCRIPT+end_POSTSUPERSCRIPT←v+Δv,absent←absentΔ\displaystyle\xleftarrow{}v+\Deltav\;,start_ARROWstart_OVERACCENTend_OVERACCENT←end_ARROWitalic_v+roman_Δitalic_v,(1)p+superscript\displaystylep^{+}italic_pstart_POSTSUPERSCRIPT+end_POSTSUPERSCRIPT←p+Δtv+,absent←absentΔsuperscript\displaystyle\xleftarrow{}p+\Deltat\,v^{+}\;,start_ARROWstart_OVERACCENTend_OVERACCENT←end_ARROWitalic_p+roman_Δitalic_titalic_vstart_POSTSUPERSCRIPT+end_POSTSUPERSCRIPT,(2)wherethevelocitychangeΔvΔ\Deltavroman_Δitalic_visthecombinedeffectofexternalandconstraintforcesonthatbody.
Wemeasurethethesimulationspeedsofoursimulatorforbothtactilemodalitiesandcomparethemtothestate-of-the-arttactilesimulatorsforeachtactilemodality.
Weusethefollowingtaskstoevaluateourtactilesimulatorandpolicy-learningtoolkit:
Wepresentpolicyresultsononeofthetasksdiscussedabove(PegInsertion)andcomparethelearningperformanceacrossdifferencesensingmodalities.Ineachcase,weshowthelearningperformanceforbehaviorcloning,DAgger,orasymmetricactor-criticRL(AAC),foracombinationofdifferentpolicyinputs.
Thepolicyinputoptionsareasfollows:
Theactionspaceisa6Dend-effectorposetargetrelativetothecurrentpose.Thisisusedtocomputeposetargetsforalower-leveltask-spaceimpedancecontroller.
Inadditiontodeveloping,trainingandtestingouralgorithmsinsimulation,wealsotestedthetrainedpoliciesontherealrobotviazero-shotsim-to-realtransfer.
Wealsodemonstratedthattheinsertionpoliciestransfertothereal-worldrobot.Here,weevaluatedthezero-shottaskperformance(successrate)usingaColor-Auginsertionpolicy.Weevaluatedthepolicyatthreedifferentsocketlocations,randomizingtheinitialend-effectorpose,peg-in-gripperpositionandpeg-in-gripperrotation({π/12,0,π/12}12012\{-\pi/12,0,\pi/12\}{-italic_π/12,0,italic_π/12})foratotalnumberof81trials.Thepolicysucceeded67times,achievingan82.7%percent82.782.7\%82.7%successratewithoutanyadditionalreal-worldfine-tuning.
WehavepresentedTacSL,anacceleratedtactilesimulatorthatgivesbothgeometricandforcefieldinformation.TacSLincludesasuiteofcontact-richtasksandatoolkitofonlinelearningalgorithmsfortactilepolicylearning,includinganovelRLalgorithm(AACD)thatenablesefficientpolicylearninginhigh-dimensionaldomains.UsingTacSL,weanalysedtheperformanceoftactileandothersensingmodalitiesinsolvingthepeg-placementandpeg-insertiontaskinsimulation.Furthermore,TacSLprescribestactilepolicy-trainingstrategiesthattransferinzero-shottotherealworld.Oncereleased,webelieveoursimulationandpolicylearningframeworkwillbeahighly-usefultestbedforleveragingtactilesensingforawiderangeofcontact-richrobotictasks.
WewouldliketothankourcolleaguesatNVIDIAfortheirinvaluableassistanceandfeedbackthroughoutthiswork.SpecialacknowledgmenttoMichaelNoseworthy,BingjieTang,BowenWen,KarlVanWyk,AnkurHandaandFabioRamosforengagingindeepconversationandprovidinginsightfulfeedback.WedeeplythankPhilippReistforhisinsightsonthephysicssolveranddetailedfeedbackonthepaperdraft.WeappreciatetheassistanceofTobiasWidmerandMiladRakhshaintheSDFimplementation.OurthanksalsogotoViktorMakoviychukandKellyGuofortheirresponsivenesstoourinquiriesaboutIsaacGym.WethankAjayMandlekarforassistanceinsettingupbehaviorcloningduringearlyprototyping,andAlperenDegirmenciforhelpincreatingvisualizationsandfiguresforthepaper.ValuablefeedbackonthepaperdraftwasprovidedbyYu-WeiChao.WeexpressourthankstoKimoJohnsonforimportantGelSighthardwaresupport.
Tocalibratethecompliantcontactparameter,weplacedknownstandardizedcalibrationweightsonthetactilesensorinbothrealandsimulatedenvironments.Thecompliantstiffnessparameter(κ\kappaitalic_κ)ischosensuchthatthesurfaceareaofthetactileimpressioninthesimulationandreal-worldenvironmentsroughlymatchfordifferentknownweights.
where
Robotpolicyactionandcontrol:Thepolicyoutputsa6Dposetargetfortheend-effector,withamaximumpositiondisplacementof0.010.010.010.01mandamaximumorientationdisplacementof0.050.050.050.05radineachdimension.Theposetargetissenttoatask-space-impedancecontrollerat60Hz.
Wepresentadditionalvisualizations(FigureLABEL:fig)ofbothtactileimagesandtactileforcefieldsfortheGelSightR1.5sensorinteractingwithacomplexboltmesh,whichcomprises30kfacesand26kvertices,atvariousposes.