I have realized that usually gene expression data (e.g. seq data) should be transformed using log2 instead of using e.g. log10 transformation. Why log2 transformation is commonly used but not other transformation? I would like to understand a basic theory behind log2 transformation linked to gene expression data.

From the previous comments you should now realize that gene expression between two different platforms like microarray and rna-seq has different properties associated with it. Likewise, in mathematics like linear algebra, there are also properties associated with the different functions, distributions and equations; Log base has different scales between base 2 and base 10. It was determined that the negative binomial distribution best fits count data to test the hypothesis for differential expression with confidence. In addition, you can scale and mean center the count data with logbase 10 transformation for biological network analysis. For microarray you can normalize it using the RMA method, and then do t-test or other to test your hypothesis. To circle back, what happens when you integrate the log2 function and what is its derivative and what properties do these have for you to apply to certain data structures that also possess its own properties, as you may have done in calculus? Sorry if this is redundant or hard to get as I was just trying to sum years of study in this small box.

There isn't any theoretical reason for using base-2 instead of any other base. One could reasonably use log10 for the fold changes. Microarray-detectable changes in expression tend to be smaller than 10-fold in my experience. You can't use the natural log when presenting data to the bench, unless you want to waste an afternoon. So base-2 makes sense as it's close to the biologically-detectable changes that are microarray-discoverable and it's an easily explainable choice of base when you're presenting to biologists.

When it is used, a main rationale for log-transformation is heteroskedasticity. The variance of expression measurements on many platforms (arrays, etc.) depends on the expression level. By log-transforming, you reduce this dependence and your data becomes better-behaved for statistical testing. As pointed out by russhh - the choice of the base 2 is just a practical one. Many other transformations can be applied to expression data. The "best" one likely depends on your measurement platform and your analysis application. For example, see variance stabilizing transformations like VST in the DESeq package. Log2 has a long history because it's simple, and it's an improvement on using raw values for statistical analysis in many cases.

From the previous comments you should now realize that gene expression between two different platforms like microarray and rna-seq has different properties associated with it. Likewise, in mathematics like linear algebra, there are also properties associated with the different functions, distributions and equations; Log base has different scales between base 2 and base 10. It was determined that the negative binomial distribution best fits count data to test the hypothesis for differential expression with confidence. In addition, you can scale and mean center the count data with logbase 10 transformation for biological network analysis. For microarray you can normalize it using the RMA method, and then do t-test or other to test your hypothesis. To circle back, what happens when you integrate the log2 function and what is its derivative and what properties do these have for you to apply to certain data structures that also possess its own properties, as you may have done in calculus? Sorry if this is redundant or hard to get as I was just trying to sum years of study in this small box.

Log2 aids in calculating fold change, and up-regulated vs down-regulated genes between replicates/samples.

There isn't any theoretical reason for using base-2 instead of any other base. One could reasonably use log10 for the fold changes. Microarray-detectable changes in expression tend to be smaller than 10-fold in my experience. You can't use the natural log when presenting data to the bench, unless you want to waste an afternoon. So base-2 makes sense as it's close to the biologically-detectable changes that are microarray-discoverable and it's an easily explainable choice of base when you're presenting to biologists.