Scenarios

Scenarios allow you to shape numeric distributions based on other columns in your schema. For example, let's say your want to generate a file where each row represents the sale of a car including the model, region, sale price, and date of sale.

Here we use the normal distribution field type to generate reasonable prices. Let's look at some sample data:

date	model	region	price
2014-10-26	Explorer	SE	25341
2014-10-30	Mustang	NE	25051
2014-10-17	Focus	SE	26003
2014-10-18	Focus	MW	24396
2014-10-02	Mustang	MW	25670
2014-10-09	Explorer	NW	25137
2014-10-14	Explorer	SE	24027
2014-10-24	Focus	SW	26206
2014-10-10	Explorer	SW	22668
2014-10-18	Explorer	NE	23611

See the problem? All models cost about the same on average. This isn't realistic. Let's create a scenario to better model the real world prices of each model.

Here we use the value of the model column to control the price range. We make the Focus model less expensive while boosting the price of the Explorer. We also adjust the standard deviation to simulate the wider price fluctuations seen on more expensive models.

Now let's change our schema to use our new scenario...

Let's have a look at some sample data...

date	model	region	price
2014-10-05	Focus	SW	16206
2014-10-20	Explorer	SW	27987
2014-10-13	Explorer	SE	31191
2014-10-17	Focus	SE	16809
2014-10-25	Focus	NE	16229
2014-10-21	Explorer	NW	29149
2014-10-28	Explorer	NW	30061
2014-10-15	Mustang	MW	26221
2014-10-03	Explorer	NE	28423
2014-10-29	Mustang	MW	26568

Much better! Now our sales figures accurately represent the average price of each model.