Building Your Own Dataset
Lately I've been making a data set for a machine learning project and, since this is the first data set I've made, there's been a learning curve.
So for anyone else who wants to build their own data set, here are tips that I picked up along the way:
Begin with the End in Mind
Think about what your project's ultimate goal is and make a list of features that you think will help get you there.
If you know some of the data will be hard to find, make a wish list section where you track all the information that you'd love to have, but will take extra effort to get.
Cast a Wide Net
And keep an eye out for sources that could be useful.
In my case, I ran across World Bank data about cell phone usage.
Even though it wasn't on my original list of features, I scooped it up because MoneyGram's focused on digital growth.
Reach Out to SMEs
There's a chance that you won't know where to find all the data that you need, and that's okay. If that happens, reach out to a team member who can give you coaching on where to go.
For instance, my wish list section was full of Compliance features where I knew public data was available, but I didn't know where to find it.
So I reached out to a co-worker and she gave great directions on where to go. (In a couple of cases, she even knew the data sources down to their URLs.)
Not only did her coaching help me dive back into data gathering, she also brought up additional features that proved valuable.
Keep Your Chin Up
Expect to run into challenges, like struggling to find usable data.
And when that happens, remember that: