Hello @t888eddy,
I'll do my best to answer your question but it's tough to answer this with 100% accuracy. Generally speaking you only need what the camera sees (or is reflected) for high end rendering , so it's difficult without knowing if you plan on importing it into a game engine or rendering with a high end renderer like Mental Ray or Vray in Max. It's a bit hard to imagine a fully populated city with interiors and exteriors at the same time, at least from a performance/RAM perspective.
If I was a TD on the project for high end rendering, I'd ask for proposed camera shots so I knew how small exactly the details were meant to be, and then I'd likely have 2 versions of the file. One version for rendering inside, and one for rendering outside. For games, I'd utilize as much texture detail as possible to make my city something that doesn't bog down the engine too much, and even then loading areas might be a necessity with lots of interior detail.
Having said all that, I would shoot for meters and see how it goes. Just know that a pencil shaving on the desk inside the house may not cast proper shadows or might have some rounding errors/distortions due to the scene scale. Similarly, if you set it to say centimeters, some of the buildings might not appear perfect, especially as they get further and further from the origin.
The good news is that you can always import your city into a new scene with different system units if problems appear. You may have to scale the world (referenced in my 1st post above) but in the end you're not stuck with your decision should things get hairy for some reason.
Best Regards,