MRPをめぐるツイート群 - 勉強などのメモ

Love seeing more MRP in the wild, but count me skeptical after the fact. Would love to see more pollsters (besides @doug_rivers and @benlauderdale of course), do this in real time! https://t.co/y7QeRFPdzy
— Jonathan Robinson (@jon_m_rob) 2018年7月15日

This is just a function of their low sample size resulting in their biased midwestern estimates being regularized down and their biased southern estimates being regularized up. If they applied this to a 200k dataset then it wouldn’t have worked (as someone who has tried).
— *1 (@davidshor) 2018年7月15日

I RTed this but I have no earthly idea what you mean, I'll just say that up front.
— Xenocrypt (@xenocryptsite) 2018年7月15日

What you mean is that they were right by accident because stuff cancelled out?
— Xenocrypt (@xenocryptsite) 2018年7月15日

I *think* what he means is that the national data was more predictive than the state level data, so using an MR P-d by national data is more accurate post facto. But if they'd had more state data to put in their model, it would have been just as wrong as state poll-heavy models.
— 𝔸lfred 𝕁ones (@rAlfredJones) 2018年7月15日

Because practically they're not really adding anything, just weighting their model more towards the (in 2016) more accurate national polls/census data and away from the (in 2016) less accurate state polls. Which, duh, because the national polls were more accurate.
— 𝔸lfred 𝕁ones (@rAlfredJones) 2018年7月15日

But that regularization only happened because they didn't have much data. If they had 10X the amount of polling data, it wouldn't have done that and the result would have been wrong.
— *2 (@davidshor) 2018年7月15日

You're doing some high level speculation here !
— Jonathan Robinson (@jon_m_rob) 2018年7月15日

but this has always been true. you can go back and find gelman having clinton winning MS in 96 or whatever for this reason
— Nate Cohn (@Nate_Cohn) 2018年7月16日

I'm aware of that as a feature of these kinds of models, I'm just willing to at least speculate about the differences in the DGP's across different survey datasets.
— Jonathan Robinson (@jon_m_rob) 2018年7月16日

Fun thought experiment/real thing that Gallup sort of did at AAPOR '18. Only build an MRP model off a sample/subset of your data so you are more conservative/leaning on the prior to hedge your data having all sorts of biases
— Jonathan Robinson (@jon_m_rob) 2018年7月16日

I think in most cycles the optimal amount of data to throw away is going to be zero and that 2016 was a weird fluke.
— *3 (@davidshor) 2018年7月16日

Well Gallup was analyzing their 2012 data so...
— Jonathan Robinson (@jon_m_rob) 2018年7月16日

2012 seems like a perfect example of how focusing on correcting non-response bias and collecting as much data as you can is going to do better than messing around with MRP pic.twitter.com/FrLppsZFn1
— *4 (@davidshor) 2018年7月16日

I mean you can do both
— Jonathan Robinson (@jon_m_rob) 2018年7月16日

that MRP isn't a magic fix for poor sampling seems rather obvious to me?
— (((Will Cubbison))) וויל קביסון (@wccubbison) 2018年7月16日

There's also just that MRP itself is like, a 2009-era methodology. The core insight of "data-pooling and regularization is good" was revolutionary at the time, but we have better tools and a very different data situation now that do that better.
— *5 (@davidshor) 2018年7月16日

MRP strikes me as an already antiquated methodology applicable primarily to pollsters who still refuse to use voter files
— Patrick Ruffini (@PatrickRuffini) 2018年7月16日

What is stopping you from using voter files on MRP?
— doug rivers (@doug_rivers) 2018年7月16日

Or put differently, having a voter file and using MRP are solutions to different parts of the problem.
— Benjamin Lauderdale (@benlauderdale) 2018年7月16日

Having a voter file enables me to generate model scores for every voter, and then take geographic aggregates from individual scores
— Patrick Ruffini (@PatrickRuffini) 2018年7月16日

i think half of the people referring to 'mrp' are doing that, even if it's not formally post-stratification
— Nate Cohn (@Nate_Cohn) 2018年7月16日

I think of MRP as a convoluted workaround where you don't have a census of all voters you can apply estimates to, if you're doing RDD for example
— Patrick Ruffini (@PatrickRuffini) 2018年7月16日

sure, though you can do the same thing on ACS or CPS microdata without formally stratifying it
— Nate Cohn (@Nate_Cohn) 2018年7月16日

bringing back in a paper author (@chadkdj) in case they want to respond to the comments in this threat; in particular, the method applied to other past elections
— Matt Grossmann (@MattGrossmann) 2018年7月16日

It *is* post-stratification with cells of size one. The whole point is that the sample cells can be small or empty, but you can still use a model to make cell level estimates. https://t.co/zs7sflPYuT
— doug rivers (@doug_rivers) 2018年7月16日

Agreed. In particular, one should be wary of sparse solutions, which don't make any sense for some representations. https://t.co/eJr5EcKHg0
— doug rivers (@doug_rivers) 2018年7月16日

*1:David Shor

*2:David Shor

*3:David Shor

*4:David Shor

*5:David Shor