Please accept my apology for bidding even though you specifically asked not to bid without ceres experience, but I couldn't resist as the problem sounds very interesting. If you're already finding people with your requirements, feel free to ignore the rest of this. I do have a lot of C++ experience, vision, and am familiar with the concepts of muti-view geometry. I have done camera pose estimation but with opencv (Its API is not as versatile or suitable for >2 sensors as what I have seen of ceres). As far as approach: Your mention of bundle adjustment gives me the impression you want to treat windows of time as sets of points for each sensor. If you want to take that approach, then the weight will go into the error, so reprojection is weighted. Another possibility would be to only use bundle adjustment at first in a calibration phase until some criteria is met. From then on, we should have the intrinsic parameters for each sensor, as well as 3 homographies. So, we can discard outliers. After that, the position can be estimated as a weighted average. I would also suggest the weight takes into account, the proximity of the point from the boundary of the FOV of each sensor.
If you are not finding anyone and you're willing to share sample data, I'd be willing to do a proof of concept to help you decide if I would be a good fit for your problem (not with all the detail).