Very simply:
For low frequencies, you need to move a lot of air, but slowly = massive heavy magnets, big speaker area (big cone).
For high frequencies, you need to move air very quickly, but not much of it = tiny speaker area (usually but not always a small dome), small magnets.
You can't do both. Normal speakers have two or more drivers, which cover the frequency range in sections. Cross-overs (traditionally made of inductors (coils), capacitors and resistors) are used to shape the frequencies which each driver sees. You can't chop off the frequency dead at a particular frequency, so one driver slopes in, and one slopes out.
Cross-over design is further complicated by the fact that the different drivers will have different characteristics (no driver in world is perfectly flat in its frequency response), so slopes need to be tweaked to get a smooth response in the cross-over area where you actually have two drivers contributing. That's just the basics, there are many other tweaks which need to be done right to get a great sounding speaker. It's a real science/art.
The alternative is not to try to cover the whole spectrum with a flat response, but to use a single driver. By it's nature, that single driver has to be a compromise, so its response will tail off towards the high and low end. Enthusiasts will say that this gives a 'cleaner' mid-range, as there are no cross-over issues between drivers in the mid-range. Detractors will say that there is no high-fidelity if you don't start with at least good attempt* at a flat response reproduction of all the frequencies in a recording.
*you can't achieve an absolute here
I hope that's a fair and balanced summary!