DSMZ’s BacDive Bacterial Diversity Database

I just discovered the DSMZ’s BacDive database, and I’d like to recommend it. BacDive is an eminently searchable metadatabase on 53,978 strains, with information on morphology and physiology, culture and growth conditions, environmental and isolation data, and more. An update was recently described in Nucleic Acids Research.


I discovered BacDive while reading an interesting paper by Uri Gophna and colleagues on the relationship between environmental temperature and HGT in nature, as well as the relationship between CRISPR abundance and HGT. Gophna’s group was able to show, based on BacDive’s database on growth temperatures, that organisms living in warmer (or hotter) habitats have had less frequent horizontal genetic transfer than those in cooler habitats. (A phylogenetically independent contrasts approach would have been nice, but I don’t want to quibble.)


I think BacDive is particularly interesting for including geographic location and environmental conditions at the time of sampling. This kind of information is valuable because it describes the environment where a strain actually lives in nature, and habitat descriptions have proved essential for characterizing the ecological and physiological differences among close relatives. But, of course, any one strain is isolated only once and we don’t get a sense of what it can do and all the places it could live from the one environment of its isolation. So, I have a population-oriented suggestion.


Perhaps for any one strain taken as a focus point, we could take a set of very close relatives that have a good likelihood of being genetically homogeneous with the focal strain, at least for habitat preferences. Then we could estimate a range of potential habitat for any focus strain based on the isolation data for its close relatives. The challenge is to decide what strains are closely related enough so that they could bear on the ecological properties of a focus strain. The species recognized by bacterial taxonomy may be too broadly defined for this purpose. Any of several algorithms such as Ecotype Simulation for discovering closely related, ecologically distinct populations from sequence data (or in the case of AdaptML, from sequence data and habitat data) could serve this purpose.


Further readings on characterizing ecological differences from habitat descriptions:




Please email me for any articles you don’t have access to.