This paper explores how task descriptors, such as a mean value $\mu$, improve in-context learning (ICL) for linear regression within Transformer models. By examining a one-layer linear self-attention (LSA) network, the researchers demonstrate that models can effectively utilize these descriptors to standardize input data and reduce prediction errors. The paper provides a mathematical proof that gradient flow training converges to a global minimum, allowing the Transformer to simulate an optimized version of gradient descent. Through various experiments, the authors confirm that adding task information leads to superior performance compared to models without such context. Furthermore, the study reveals that while large sample sizes simplify the model's strategy, finite sample settings require the Transformer to develop more complex internal representations to manage bias and variance. These findings provide a theoretical foundation for the empirical success of prompts and instructions in large language models.