Linear-sized independent sets in random cographs and increasing subsequences in separable permutations

This paper is interested in independent sets (or equivalently, cliques) in uniform random cographs. We also study their permutation analogs, namely, increasing subsequences in uniform random separable permutations. First, we prove that, with high probability as $n$ gets large, the largest independent set in a uniform random cograph with $n$ vertices has size $o(n)$. This answers a question of Kang, McDiarmid, Reed and Scott. Using the connection between graphs and permutations via inversion graphs, we also give a similar result for the longest increasing subsequence in separable permutations. These results are proved using the self-similarity of the Brownian limits of random cographs and random separable permutations, and actually apply more generally to all families of graphs and permutations with the same limit. Second, and unexpectedly given the above results, we show that for $\beta>0$ sufficiently small, the expected number of independent sets of size $\beta n$ in a uniform random cograph with $n$ vertices grows exponentially fast with $n$. We also prove a permutation analog of this result. This time the proofs rely on singularity analysis of the associated bivariate generating functions.


Introduction
This paper contains both results for graph and permutation models (connected through the mapping associating with a permutation its inversion graph); for simplicity we present these results and the related backgrounds separately.

Independent sets in random cographs
Cographs were introduced in the seventies by several authors independently (under various names), see e.g. [Sei74]. They enjoy several equivalent characterizations. Among others, cographs are • the graphs avoiding P 4 (the path with four vertices) as an induced subgraph; • the graphs whose modular decomposition does not involve any prime graph; • the inversion graphs of separable permutations; • the graphs which can be constructed from graphs with one vertex by taking disjoint unions and joins.
The latter characterization is the most useful for our purpose, let us introduce the terminology. All graphs considered in this paper are simple (i.e. without multiple edges, nor loops) and not directed. Two labeled graphs (V, E) and (V , E ) are isomorphic if there exists a bijection from V to V which maps E to E . Equivalence classes of labeled graphs for the above relation are unlabeled graphs. Throughout this paper, the size of a graph is its number of vertices, and we denote by V G the set of vertices of any graph G.
Let G = (V, E) and G = (V , E ) be labeled graphs with disjoint vertex sets. We define their disjoint union as the graph (V V , E E ) (the symbol denoting as usual the disjoint union of two sets). We also define their join as the graph (V V , E E (V ×V )): namely, we take copies of G and G , and add all edges from a vertex of G to a vertex of G . Both definitions readily extend to more than two graphs, adding edges between any two vertices originating from different graphs in the case of the join operation. Definition 1.1. A labeled cograph is a labeled graph that can be generated from single-vertex graphs applying join and disjoint union operations. An unlabeled cograph is the underlying unlabeled graph of a labeled cograph.
Recall that, for a given graph G, an independent set is a subset of vertices in G no two of which are adjacent, while a clique is a subset of vertices in G such that every two vertices are adjacent.
The main motivation for studying independent sets in random cographs comes from the series of papers [LRSTT10,KMRS14] on a probabilistic version of the Erdős-Hajnal conjecture.
For a graph G, a subset of its vertices is called homogeneous if it is either a clique or an independent set. It is well-known that every graph of size n has a homogeneous set of size at least logarithmic in n, and that this is optimal up to a constant (much work is devoted to get the precise asymptotics; this is equivalent to the computation of diagonal Ramsey numbers, see [Spe75,Sah20] for the better bounds up to date). The conjecture of Erdős and Hajnal states that, assuming that the graphs avoid any given subgraph (as an induced subgraph), homogeneous sets of size polynomial in n necessarily exist. More precisely, for any H, there exists a constant ε = ε(H) > 0 such that every H-free graph has a homogeneous set of size n ε .
Despite much effort, the Erdős-Hajnal conjecture is still widely open; see for example the survey [Chu14]. A natural relaxation of the conjecture consists in replacing "every H-free graph" in the statement above by "almost all H-free graphs". This weaker version has been established in [LRSTT10]. For a large family of constraints H, this result was further improved by Kang, McDiarmid, Reed and Scott in [KMRS14]: for those H, a uniform random H-free graph has with high probability a homogeneous set of size linear in n. When this holds, the graph H is said to have the asymptotic linear Erdős-Hajnal property (see [KMRS14] for a formal definition). Kang, McDiarmid, Reed and Scott ask whether H = P 4 has the asymptotic linear Erdős-Hajnal property, i.e. whether a uniform random cograph with n vertices has a homogeneous set of size linear in n [ KMRS14,Section 5], and this question has remained open until now 1 .
Our first result answers this question in the negative. In the following, for a graph G, we denote α(G) the maximum size of an independent set in G, also called the independence number of G.
Theorem 1.2. Let G n be a uniform random cograph (either labeled or unlabeled) of size n. The maximum size of an independent set in G n is sublinear in n, namely α(Gn) n converges to 0 in probability.
For a discussion on the difference between the labeled and unlabeled settings, we refer to Remark 1.4 below. We recall the standard notation for comparison of random variables: X n = o P (Y n ) if X n /Y n tends to 0 in probability. The above theorem says that α(G n ) is o P (n). By taking complements (see the identity (4.1)) it also holds that the size of the largest clique in G n is o P (n). Consequently, the size of the largest homogeneous set is also o P (n), answering negatively the question of Kang, McDiarmid, Reed and Scott [KMRS14, Section 5]: P 4 does not have the asymptotic linear Erdős-Hajnal property.
A different approach to study independent sets of size linear in n in random cographs is the following. Let X k (G) be the number of independent sets of size k in a graph G. From Theorem 1.2, if G n is a uniform random (labeled) cograph, then the random variable X n,k := X k (G n ) tends to 0 in probability if k ∼ βn for some β > 0 as n tends to infinity. We show that nonetheless, its expectation grows exponentially fast for β small enough. In particular, this indicates that Theorem 1.2 cannot be proved by a naive use of the first moment method. More precisely, we have the following result. Theorem 1.3. For each n 1, let G n be a uniform random labeled cograph of size n, and let X n,k be the number of independent sets of size k in G n . Then there exist some computable functions B β > 0, C β > 0 (0 < β < 1) with the following property. For every fixed closed interval [a, b] ⊆ (0, 1), we have E[X n,k ] ∼ B k/n n −1/2 (C k/n ) n (1.1) uniformly for an k bn. Furthermore, 1. When β → 0, we have C β = 1 + β| log(β)| + o(β log(β)).
We have found no explicit formula for the growth constant C β but C β can be computed numerically with arbitrary precision: see Equation (4.29). To get an idea of how fast X n,k can grow if k ∼ βn, we mention that the function β → C β seems to have a unique maximum on (0, 1) (see a plot in Figure 1.1); denoting β its location, we have the following numerical estimates: β ≈ 0.229285 . . . ; C β ≈ 1.366306 . . .
As additional motivation for Theorem 1.3, let us mention the work of Drmota, Ramos, Requilé and Rué [DRRR20]: they prove (among other things; see their Corollary 2) the exponential growth of the expected number of maximal independent sets in some subcritical graph classes such as trees, cacti, series-parallel graphs, . . . (here "maximal independent sets" refers to independent sets that are maximal for inclusion among all independent sets; such sets are not necessarily of maximum size among all independent sets). It could be interesting to adapt our arguments to consider maximal independent sets in cographs instead of independent sets of fixed size.
Remark 1.4. There are two different ways to pick a uniform random cograph with n vertices: taking it uniformly at random among labeled or among unlabeled cographs. Even if the sizes of independent sets are independent from the labelings, this gives two different probability distributions, since some unlabeled cographs have more symmetries and hence fewer distinct labelings than others.
The reader may have noticed that in Theorem 1.2, we consider either labeled or unlabeled uniform random cographs, while Theorem 1.3 only considers the labeled setting. The reason of this choice is given in Section 1.3 when discussing proof methods.
Remark 1.5. A natural question is to determine the order of magnitude of α(G n ). A basic lower bound of order √ n is derived as follows. Since cographs are perfect graphs, for any cograph G, we have (see [Chu14,Theorem 1.4 where ω(G) is the size of the largest clique of G. By symmetry we have that α(G n ) = ω(G n ) if G n is a uniform (labeled or unlabeled) cograph. Hence: which means that α(G n ) is not o P ( √ n). We have not been able to improve this bound, but we believe it to be far from optimal. In fact, (limited) numerical simulations, as well as the material in [Wür12, Chapter 9] make us believe that α(G n ) is of order n/ log(n).

Increasing subsequences in random separable permutations
The asymptotic behavior of the length of the longest increasing subsequence LIS(s n ) in a uniform random permutation s n of size n is an old and famous problem that led to surprising and deep connections with various areas of pure mathematics (representation theory, combinatorics, linear algebra and operator theory, random matrices,. . . ). In particular, it is well-known that LIS(s n ) is typically close to 2 √ n and has Tracy-Widom fluctuations of order n 1/6 . We refer to [Rom15] for a nice and modern introduction to this topic.
Longest increasing subsequences in random permutations in permutation classes are a much newer topic: see [MRRY20] and references therein. The methods of the present paper allow the proof of the sublinear behavior of the length of the longest increasing subsequence in a uniform random separable permutation. Let us introduce terminology.
Given a permutation σ of size n (i.e. a sequence σ(1) . . . σ(n) containing exactly once each integer from 1 to n), and given a subset I = {i 1 < · · · < i k } of {1, . . . , n}, the pattern of σ induced by I is the permutation π of size k such that π( ) < π(m) if and only if σ(i ) < σ(i m ). The study of patterns in permutations is an active research topic, particularly in enumerative combinatorics, see e.g. [Vat16,Kit11] and references therein. The relation "is a pattern of" is a partial order on the set of all permutations (of all finite sizes), and permutation classes are downsets for this order. Equivalently, permutation classes can be defined as sets of permutations characterized by the avoidance of a (finite or infinite) set of patterns. Definition 1.6. A separable permutation is a permutation which avoids the patterns 2413 and 3142.
Separable permutations enjoy many other characterizations, including the following (the related terminology is defined later in this paper if needed, or e.g. in [Vat16]): • they are the permutations whose inversion graph is a cograph; • they can be obtained from permutations of size 1 by performing direct sums and skew sums; • no simple permutation appears in their substitution decomposition.
The class of separable permutations is natural, well-studied, and displays many nice properties; we refer the reader to [BBF+18, end of Section 1.1] for a presentation of these properties and a review of literature. We shall also review some of them in Section 5.
We can now state our analog of Theorem 1.2 for separable permutations.
Theorem 1.7. For each n 1, let σ n be a uniform random separable permutation of size n. Then, the maximal length of an increasing subsequence in σ n is sublinear in n, namely LIS(σn) n converges to 0 in probability.
Two remarks about this statement. First, the above sublinearity result does not only apply to separable permutations, but also to any permutation class having a Brownian separable permuton as permuton limit -see Section 1.3. Second, as for cographs, we unfortunately did not find a better lower bound for LIS(σ n ) than the trivial √ n one. The same argument as above applies, where (1.2) is replaced by Erdős-Szekeres's Lemma (see e.g. [Rom15, Th.1.2]).
We make a further remark about the relation between Theorems 1.2 and 1.7. Recall that for any permutation σ of size n, its inversion graph (denoted inv(σ)) is the unlabeled version of the graph with vertex set {1, . . . , n} where there is an edge between i and j if and only if i and j form an inversion in σ, that is (i − j)(σ(i) − σ(j)) < 0. Clearly, through this correspondence, an increasing sequence in σ is mapped to an independent set in inv(σ). Nevertheless, Theorem 1.7 is not simply the translation of Theorem 1.2 from the graph setting to the permutation setting. Indeed, since the inversion graph correspondence is not one-to-one, for σ n a uniform random separable permutation, inv(σ n ) is not a uniform random unlabeled cograph. (We further note that defining inv(σ) as a labeled cograph in the obvious manner, inv(σ n ) would also not be a uniform random labeled cograph.) We also establish a counterpart of Theorem 1.3 for increasing subsequences in separable permutations.

Proof methods and universality
Our sublinearity results are based on limit theorems for uniform random cographs and uniform random separable permutations. We first discuss the graph setting.
It is proved in [BBF+22b,Stu21] that a uniform random (labeled or unlabeled) cograph of size n converges in the sense of graphons to a limit W 1/2 , called the Brownian cographon of parameter 1/2 (see also the independent work of Stufler [Stu21]). We refer to Section 2.2 for details. Moreover, the notion of independence number of a graph has been extended to graphons by Hladkỳ and Rocha [HR20], who proved a semicontinuity property for it (see Section 2.3). Combining these two elements, Theorem 1.2 will follow from the fact that the independence number α(W 1/2 ) of the Brownian cographon is 0 a.s (see Section 3.3). To prove the latter, we use the explicit construction of the Brownian cographon from a Brownian excursion and some selfsimilarity property of the Brownian excursion (namely Aldous' decomposition of a Brownian excursion with two independent points into three independent Brownian excursions of random sizes [Ald94]); we deduce from that an inequation in distribution for α(W 1/2 ) (Section 3.1; the use of inequation instead of inequality is justified there) and we conclude by a fixed point argument (Section 3.2).
An interesting aspect of the proof sketched above is that it relies solely on the fact that uniform random cographs tend to the Brownian cographon; moreover the value p = 1/2 of the parameter of the limit is irrelevant in the proof. Convergence to the Brownian cographon was proved in [BBF+22b,Stu21] both in the labeled and unlabeled settings, so that Theorem 1.2 is proved simultaneously in both settings. In fact, Theorem 1.2 is proved as a special case of the following theorem.
Theorem 1.9. Let G n be a sequence of random graphs tending to the Brownian cographon W p for p ∈ [0, 1). Then the maximum size of an independent set in G n is sublinear in n, namely α(Gn) n converges to 0 in probability.
By analogy with the realm of permutations (see below), we expect that uniform random graphs in families of graphs well-behaved for the modular decomposition (e.g., a graph class whose modular decomposition trees are all those obtained from a finite set of prime graphs) tend to W p , and hence have a sublinear independence number.
Let us now discuss Theorem 1.7, i.e. the sublinearity of the length of the longest increasing subsequence in a random separable permutation σ n . It is known that σ n tends in the permuton topology to a limit µ 1/2 , called the Brownian separable permuton of parameter 1/2, see [BBF+18] for the original reference.
As discussed earlier, an increasing subsequence of a permutation corresponds to an independent set of its inversion graph. We remark in Section 2.4 that if a sequence of permutations converges in distribution to the Brownian separable permuton of parameter p, then the corresponding inversion graphs converge in distribution to the Brownian cographon W p .
Hence Theorem 1.9 implies the following general result, of which Theorem 1.7 is a particular case (see Section 3.3 for details).
Theorem 1.10. Let σ n be a sequence of random permutations tending to the Brownian separable permuton µ p for p ∈ [0, 1). Then the maximal length of an increasing subsequence in σ n is sublinear in n, namely LIS(σn) n converges to 0 in probability.
We note that the Brownian separable permuton µ p has been proved to be a universal limit for uniform random permutations in many permutation classes (well-behaved with respect to the substitution decomposition) [BBF+20, BBF+22a, BBFS19], so Theorem 1.10 applies to all these classes.
The technique to prove Theorems 1.3 and 1.8 is completely different. Indeed, the expectation of X n,k (resp. Z n,k ) for k ∼ βn for some β > 0 is driven by a set of cographs (resp. separable permutations) of small probability and can therefore not be inferred from their limit in distribution. In this case, we use the representation of cographs as cotrees, and its analogue for separable permutations through substitution decomposition trees. These tree representations are useful tools in algorithms both for graphs and permutations (see e.g. [HP05,BCH+08] for graphs and [BBL98] for permutations); in the case of permutations, substitution decomposition trees have also been widely used in recent years for enumeration problems (see [Vat16, Section 3.2] and references therein). The tree encoding allows us to write a system of equations for the bivariate generating function of cographs with a marked independent set (resp. separable permutations with a marked increasing subsequence). We then obtain our results through singularity analysis.
Unlike for Theorems 1.2 and 1.7, the results we prove are specific to either labeled cographs or separable permutations and do not rely on their Brownian limits. However, our approach should extend to other families of graphs and permutations well-encoded by their (modular or substitution) decomposition trees, but we did not pursue this direction. One such model would be unlabeled cographs: in this model, the analytic equations involve the so-called Pólya operators, making the analysis more technical but we do not expect qualitative differences in the result.

Organization of the paper
The proofs of our two sets of results can be read independently.
• Section 2 provides the necessary background regarding graphons and permutons. Then we prove Theorems 1.9 and 1.10 in Section 3.
• The proofs of Theorems 1.3 and 1.8 are given in Section 4 and Section 5, respectively.

Preliminaries: graphons, permutons, independence number and increasing subsequences
We first recall some general material from the theory of graphons (Section 2.1). We present here the strict minimum needed for this paper; an extensive presentation can be found in [Lov12, Chapters 7-16]. Then in Sections 2.2 to 2.4 we review recent material from the literature, used for our proof of Theorems 1.9 and 1.10: • the convergence of uniform random cographs to the Brownian cographon; • the notion of independence number of graphons; • a connection between graphons and the analogue theory for permutations, that of permutons.

Basics on graphons
A graphon (contraction for graph function) is a measurable symmetric function from [0, 1] 2 to [0, 1]. Intuitively, we can think of it as the adjacency matrix of an infinite (weighted) graph with vertex set [0, 1]. A finite graph G with vertex set {1, . . . , n} can be seen as a graphon W G as follows: W G (x, y) = 1 if the vertices with labels xn and yn are connected in G ( z being the nearest integer above z, with the unusual convention 0 = 1) and W G (x, y) = 0 otherwise. Sampling. Let W be a graphon and k a positive extended integer (i.e. k ∈ Z >0 ∪ {+∞}). We consider two independent families (U i ) 1 i k and (X i,j ) 1 i<j k of i.i.d. uniform random variables in [0, 1]. Given this, we define a random graph Sample k (W ) as follows 2 : its vertex set is [k] := {1, . . . , k} and for every i, j, vertices i and j are connected iff X i,j W (U i , U j ). In other words vertices i and j are connected with probability W (U i , U j ), independently of each other conditionally on the sequence (U i ) 1 i k .
We note that, for k > k, the restriction Sample k (W )[k] of Sample k (W ) to the vertex set [k] has the same distribution as Sample k (W ). In particular, the random graph Sample ∞ (W ) induces a realization of all Sample k (W ) in the same probability space.
Convergence. By definition, a sequence of graphons (W n ) converges to a graphon W if, for all k, Sample k (W n ) converges in distribution to Sample k (W ). It can be shown that this is equivalent to the convergence for the so-called cut-distance; see [Lov12, Theorem 11.5]. We note that the graphon limit is unique only up to some equivalence relation, called weak equivalence [Lov12, Sections 7.3, 10.7, 13.2]. Moreover, the quotient of the set of graphons by the weak equivalence relation, equipped with the cut-distance metric, is a compact metric space, that we shall call from now on the space of graphons. Finally, we say that a sequence of graphs (G n ) n 1 converges to a graphon W if the associated graphons (W Gn ) converge to W in the space of graphons, and that a sequence of random graphs (G n ) n 1 converges in distribution to a random graphon W , if W Gn converges to W in distribution, as random elements of the space of graphons.

Convergence to the Brownian cographon
Let e : [0, 1] → R denote a Brownian excursion of length one. We recall that, a.s., e has a countable set of local minima, which are all strict and have distinct values 3 . Let us denote 1} an enumeration of the positions of these local minima. It is possible to choose this enumeration in such a way that the b i 's and the subsequent functions defined in this section are measurable; we refer to [Maa20, Lemma 2.3] and [BBF+22b, Section 4] for details.
We now choose i.i.d. Bernoulli variables s i with P(s i = 0) = p, independent from the foregoing, and write S p = (s i ) i 1 . We call (e, S p ) a decorated Brownian excursion, thinking of the variable s i as a decoration attached to the local minimum b i (e).
For x, y ∈ [0, 1], we define Dec(x, y; e, S p ) to be the decoration of the minimum of e on the interval [x, y] (or [y, x] if y x; we shall not repeat this precision below). If this minimum is not unique or attained in x or y and therefore not a local minimum, Dec(x, y; e, S p ) is ill-defined and we take the convention Dec(x, y; e, S p ) = 0. Note however that, for uniform random x and y, this happens with probability 0, so that the object constructed in Definition 2.1 below is independent from this convention.
Definition 2.1. The Brownian cographon W p of parameter p is the random function For example, in Fig.3.1 if the decoration at b is 0 (resp. 1), then the graphon is constant equal to 0 (resp. 1) on the rectangle (a, b) × (b, c).
Theorem 2.2. Uniform random cographs (either labeled or unlabeled) converge in distribution to the Brownian cographon of parameter 1/2, in the space of graphons.

Independence number of a graphon and semi-continuity
Let W be a (deterministic) graphon. Following Hladkỳ and Rocha [HR20], we define an independent set I of a graphon W as a measurable subset of [0, 1] such that W (x, y) = 0 for almost every (x, y) in I × I. The independence number α(W ) of a graphon W is then (2.1) where Leb(I) denotes the Lebegue measure of I. Note that α(W ) is attained by some independent set I (that is, the supremum in Eq. (2.1) is in fact a maximum): this follows from e.g.
[HHP19, Lemma 2.4]. Clearly, for a graph G we have where α(G) is the maximum size of an independent set in G.
Of crucial interest for this paper is the lower semi-continuity of the function α on the space of graphons [HR20, Corollary 7]. Concretely, this says the following. Proposition 2.3. Suppose that (W n ) n 1 is a sequence of graphons that converges to some W in the space of graphons. Then lim sup α(W n ) α(W ).
Remark 2.4. In the following, we will consider a random variable of the kind α(W ) where W is a random graphon. For this to make sense, the map α should be measurable. Since it is defined as a supremum over an uncountable set, its mesurability is not a priori clear. However, it is known that any semi-continuous function is measurable, so that Proposition 2.3 implies that α is indeed measurable. We shall not discuss this point further in the paper.
In the rest of this subsection, we give an alternative definition for α(W ). This definition is not needed in the rest of the paper (and therefore can be safely skipped); however, it answers a question raised by Hladkỳ and Rocha [HR20, Section 3.2], who asked for a connection between the statistics α(W ) and subgraph densities (or equivalently, samples) of W .
For a graphon W , we set Since Sample ∞ (W ) is a random graph, the right-hand side is a priori a random variable. We recall that Sample ∞ (W ) is constructed from i.i.d. random variables {U i , X i,j , 1 i < j}. We denote G n the σ-algebra generated by {U i , X i,j , n < i < j}. It is a simple exercise to see that α 2 (W ) is measurable with respect to the tail σ-algebra n 1 G n . By Kolmogorov's 0 − 1 law (easily adapted to our situation with bi-indexed i.i.d. random variables), α 2 (W ) is almost surely equal to a constant.
Lemma 2.5. For any graphon W , we have α 2 (W ) = α(W ) almost surely, and the lim inf defining α 2 (W ) is almost surely an actual limit.
Proof. We first prove α 2 (W ) α(W ) almost surely. Let I be an independent set of W . For any k 1, we observe that the set Hence, a.s.
As k tends to infinity, the law of large numbers asserts that |J k |/k tends a.s. to Leb(I). Therefore we have a.s. α 2 (W ) Leb(I). Since this holds for any independent set I of W , we can consider the independant set I that realizes the maximum in Eq. (2.1), proving α 2 (W ) α(W ) a.s..
Let us prove the converse inequality. It is known that (Sample ∞ (W )[k]) converges a.s. to W in the space of graphons (e.g. as a consequence of [Lov12, Lemma 10.16]). Using (2.2) and Proposition 2.3 this implies that, a.s., This concludes the proof that almost surely α 2 (W ) = α(W ). Moreover in the identity ) the lim inf is an actual limit.

The Brownian separable permuton and its relation to the Brownian cographon
The theory of permutons (see [GGKK15,HKMRS13]) plays the same role for limits of permutations as the theory of graphons does for dense graphs. A permuton is a probability measure on the unit square with uniform marginals, and the space of permutons equipped with the weak convergence of measures is a compact metric space. We attach to each permutation σ of size n 1 the measure µ σ on the unit square with density (x, y) → n1 σ( nx )= ny , which is a permuton. This defines a dense embedding of the set of permutations into the space of permutons.
Recall that inv(σ) denotes the (unlabeled) inversion graph of a permutation σ.
Proposition 2.6. Let p ∈ [0, 1] and (σ n ) n be a sequence of random permutations such that µ σn Remark 2.7. It was observed in [GGKK15, End of Section 2] that inv possesses an extension which is a continuous map inv from the space of permutons to the space of graphons. The above proposition implies that the image of µ p by inv is W p .
Proof. For every k 1, denote b k,p a uniform random plane binary tree with k (unlabeled) leaves, whose internal vertices are decorated with independent signs {⊕, } such that P(⊕) = p. Before entering the actual proof, we present a useful link between a separable permutation and an unlabeled cograph constructed from b k,p .
Following [BBF+22a, Definition 2.3], we may associate with b k,p a separable permutation, denoted perm(b k,p ). We do not recall this construction here (for details, see the above reference or the beginning of Section 5), but indicate an important property it enjoys: for 1 i < j k, if and only if the youngest common ancestor of the i-th and j-th leaves (in the left-to-right order) of b k,p carries a sign.
Similarly, we may also associate with b k,p an unlabeled cograph. We first replace by 1 and ⊕ by 0 in all internal nodes and then we forget the plane embedding. We denote byb k,p the resulting non-plane and unlabeled decorated tree. With this tree, we associate an unlabeled cograph Cograph(b k,p ) as follows: its vertices correspond to the leaves ofb k,p , and there is an edge between the vertices corresponding to leaves and if and only if the youngest common ancestor of and carries the decoration 1. An alternative recursive presentation of this construction, making it clear that the constructed graph is indeed a cograph, is given at the beginning of Section 4.
By construction, the equality inv(perm(b k,p )) = Cograph(b k,p ) of unlabeled graphs holds. Denote σ n,k a uniform random pattern of size k in σ n . Theorem 3.1 and Definition 3.5 in [BBF+22a] imply that σ n,k converges in distribution to the random separable permutation perm(b k,p ). As this is a convergence in distribution in the discrete space consisting of all permutations of size k, the map inv is continuous, and we obtain the following convergence of unlabeled graphs: It is easy to check that the actions of taking patterns (resp. induced subgraphs) and of computing inversion graphs commute. Namely, for a permutation σ and a subset I of its indices, the inversion graph of the pattern of σ induced by I is the subgraph of the inversion graph of σ induced by the vertices corresponding to I. Therefore, inv(σ n,k ) -which appears on the lefthand-side of Equation (2.4) -has the same distribution as the subgraph induced by a uniform random subset of k distinct vertices of G n .
On the right-hand side of Equation (2.4), we have already identified inv(perm(b k,p )) as Cograph(b k,p ). We recall thatb k,p is the non-plane version of a uniform random (unlabeled) plane binary tree with independent decorations on its internal nodes. We claim that this has the same distribution as the unlabeled version of a uniform random labeled non-plane binary tree, with the same rule for decorations of the internal nodes (which we denote b P,L k,p ). Admitting this claim for the moment, and comparing with [BBF+22b, Proposition 4.3], we get that the right-hand side of Equation (2.4) is distributed as Sample k (W p ).
With these considerations in hand, we can use [BBF+22b, Theorem 3.8] (more precisely the implication (d) ⇒ (a) and Eq. (4) following this theorem) and conclude from Equation This ends the proof of the proposition, up to the above claim.
It remains to prove thatb k,p d = b P,L k,p , as non-plane unlabeled trees. Since the rule for the random decorations are the same on both sides, we disregard decorations, and denote the underlying undecorated random treesb k and b P,L k respectively. To prove thatb k d = b P,L k , we compare both distributions with that of a uniform labeled plane binary tree with k leaves b P,L k . Since every non-plane labeled binary tree with k leaves can be embedded in the plane in 2 k−1 ways, we have b P,L k d = b P,L k as non-plane unlabeled trees (there are no symmetry problems, since trees are labeled). On the other hand, since every plane unlabeled binary tree with k leaves can be labeled in k! ways, we haveb k d = b P,L k as non-plane unlabeled trees (again, there are no symmetry problems, since trees are plane). We conclude thatb k,p d = b P,L k,p , as wanted.

Proof of the sublinearity results through self-similarity
The main part of the proof of our sublinearity results (Theorems 1.9 and 1.10) is done in the continuous world, proving that the independence number α(W p ) of the Brownian cographon is almost surely equal to 0. To this end, we first show that the distribution of α(W p ) is solution of a specific inequation -this is Proposition 3.1. Next, we prove that the only solution of this inequation is the Dirac distribution δ 0 -this is Proposition 3.2. All results are gathered in Section 3.3, completing the proofs of Theorems 1.9 and 1.10.

An inequation in distribution
We use the standard stochastic domination order between real distributions µ and ν. Namely, we write µ d ν if µ([x, +∞)) ν([x, +∞)) for every real x. By Strassen's Theorem, this is equivalent to the fact that we can find Z 1 and Z 2 defined on the same probability space with distributions µ and ν respectively, such that Z 1 Z 2 almost surely.
Our goal is now to show that the distribution of the random variable α(W p ) is stochastically dominated by another distribution defined using some independent copies of α(W p ) (we refer to such a relation as an inequation in distribution 4 ). To this end, we use Aldous' decomposition of a Brownian excursion into three independent excursions (see Figure 3.1). This decomposition has an immediate counterpart, where we decompose a Brownian cographon into three independent Brownian cographons. We then look closely at the behavior of the functional α along this decomposition.
Then [Ald94, Corollary 3] states that the random functions e 0 , e 1 , e 2 are three independent Brownian excursions, independent from the vector (∆ 0 , ∆ 1 , ∆ 2 ); moreover, the latter has distribution Dirichlet(1/2, 1/2, 1/2). Since I is an independent set of W p , Equation (3.7) implies that I k is an independent set of W p k for every k ∈ {0, 1, 2}. In particular, Leb(I k ) α(W p k ). Moreover, we notice that if B = 1, then either Leb(I 1 ) = 0 or Leb(I 2 ) = 0 (by definition of independent set in a graphon). Together with Equation (3.8), we deduce From Equation (2.1), taking the supremum over independent sets I of α(W p ), one obtains the following a.s. inequality Since α(W p k ) has the same distribution as α(W p ) for k ∈ {0, 1, 2}, and the three are independent, the right-hand-side is a random variable distributed as Y Law( α(W p )) (p) , proving that Law( α(W p )) satisfies Equation (3.4).

Solving the inequation Proposition For p in [0, 1), the Dirac distribution µ = δ 0 is the only probability distribution on [0, 1] solution of the inequation (3.4).
We start by stating and proving a key lemma. Recall the definition of Φ p (µ) from (3.4). The map Φ p is a functional from the space M 1 ([0, 1]) of probability distributions on [0, 1]. The space M 1 ([0, 1]) can be endowed with the so-called Wasserstein distance (also called optimal cost distance, or Kantorovich-Rubinstein distance): where the infimum is taken over all pairs (X, X ) of random variables defined on the same probability space with distributions ν and ν , respectively. We will use below the fact that this infimum is reached (for an explicit expression of the minimizing coupling see e.g. Remark 2.30 in [PC19]). Furthermore since we are working on a compact space, convergence for d W is equivalent to weak convergence of measures (see [Vil08, Sec.6]).

As in Eqs
and Y ν (p) on the same probability space and coupled in a non-trivial way: we use the same vector (∆ 0 , ∆ 1 , ∆ 2 ) and Bernoulli variable B for both Y µ (p) and Y ν (p) . Then we have where we used successively the fact that ∆ i is independent from (X µ i , X ν i ), the fact that the coupling X µ i , X ν i minimizes their L 1 distance and the fact that 2 i=0 ∆ i = 1 almost surely. We also have We recall the trivial inequality | max(a, b)−max(c, d)| max(|a−c|, |b−d|) |a−c|+|b−d|. Besides, the second inequality is strict as soon as a = c and b = d. Taking we obtain that, almost surely, Moreover, since µ = ν, we have that X µ 1 = X ν 1 with positive probability. The same holds for X µ 2 = X ν 2 , and, by independence, both inequalities occur simultaneously with positive probability. Since ∆ 1 and ∆ 2 are positive almost surely, we have that a = c and b = d simultaneously with positive probability. We conclude that the above inequality is strict with positive probability. Taking expectation and using Equation (3.10), we get where the last equality is taken from (3.9). Finally, The lemma thus follows from Equations (3.9) and (3.11) and the fact that p = 1.

Proof of Proposition 3.2. We first note that Φ p is nondecreasing with respect to stochastic domination, namely if
for all k 1. Moreover the Dirac distribution δ 0 is a fixed point of Φ p . Since Φ p is a weak contraction by Lemma 3.3 and since M 1 ([0, 1]) is compact, we know from Banach fixed-point theorem that Φ k p tends to δ 0 in distribution. Combined with µ d Φ k p (µ), this forces µ = δ 0 for any probability distribution µ on [0, 1] verifying (3.4), which is what we wanted to prove.

Completing the proof of the sublinearity results
Propositions 3.1 and 3.2 imply the following result, which is the core of the proofs of our sublinearity results (Theorems 1.9 and 1.10). We now proceed with the proofs of our sublinearity results.
Proof of Theorem 1.9. Let p ∈ [0, 1) and consider a sequence (G n ) of random graphs which converges to the Brownian cographon W p . By Skorokhod's representation theorem, we can represent all G n and W p on the same probability space so that G n converges to W p in the cut distance almost surely. Applying Proposition 2.3, we get that, a.s., Proof of Theorem 1.10. Recall that, for any permutation σ, there is a one-to-one correspondence between increasing subsequences of σ and independent sets of inv(σ). In particular, one has LIS(σ) = α(inv(σ)).

Expected number of independent sets of linear size
For k n let X n,k be the random variable given by the number of independent sets of size k in a uniform labeled cograph of size n. The goal of this section is to prove Theorem 1.3, i.e. to estimate E[X n,k ] in the case where k grows linearly in n.
The first step of the proof is to obtain equations for the exponential generating series of cographs with a marked independent set, through symbolic combinatorics. To this aim, it is convenient to encode cographs by their cotrees. The asymptotic analysis is then performed via saddle-point analysis.

Combinatorial preliminaries: cographs and cotrees
Definition 4.1. A labeled cotree of size n is a rooted tree t with n leaves labeled from 1 to n such that: • t is not plane, (i.e. the children of every internal node are not ordered); • every internal node has at least two children; • every internal node in t is decorated with a 0 or a 1; • decorations 0 and 1 should alternate along each branch from the root to a leaf.
An unlabeled cotree of size n is a labeled cotree of size n where we forget the labels on the leaves.
For an unlabeled cotree t, we denote by Cograph(t) the unlabeled graph defined recursively as follows (see an illustration in Figure 4.1): • If t consists of a single leaf, then Cograph(t) is the graph with a single vertex.
• Otherwise, the root of t has decoration 0 or 1 and has subtrees t 1 , . . . , t d attached to it (d 2). Then, if the root has decoration 0, we let Cograph(t) be the disjoint union of Cograph(t 1 ), . . . , Cograph(t d ). Otherwise, the root has decoration 1, and we let Cograph(t) be the join of Cograph(t 1 ), . . . , Cograph(t d ). Note that the above construction naturally entails a one-to-one correspondence between the leaves of the cotree t and the vertices of its associated graph Cograph(t). Therefore, it maps the size of a cotree to the size of the associated graph. Another consequence is that we can extend the above construction to a labeled cotree t, and obtain a labeled graph (also denoted Cograph(t)), with vertex set {1, . . . , n}: each vertex of Cograph(t) receives the label of the corresponding leaf of t.
By construction, for all cotrees t, the graph Cograph(t) is a cograph. Conversely, each cograph can be obtained in this way, and this correspondence is one-to-one. This property is ensured by the alternation of decorations 0 and 1 in cotrees. This was first shown in [CLS81]. The presentation of [CLS81], although equivalent, is however a little bit different, since cographs are generated using exclusively "complemented unions" instead of disjoint unions and joins. The presentation we adopt has since been used in many algorithmic papers, see e.g. [HP05,BCH+08].
From a cograph G, the unique cotree t such that Cograph(t) = G is recursively built as follows. If G consists of a single vertex, t is the unique cotree with a single leaf. If G has at least two vertices, we distinguish cases depending on whether G is connected or not.
• If G is not connected, the root of t is decorated with 0 and the subtrees attached to it are the cographs associated with the connected components of G.
• If G is connected, the root of t is decorated with 1 and the subtrees attached to it are the cographs associated with the induced subgraphs of G whose vertex sets are those of the connected components ofḠ, whereḠ is the complement of G (graph on the same vertices with complement edge set).
Important properties of cographs which justify the correctness of the above construction are the following: cographs are stable under taking induced subgraphs and complement, and a cograph G of size at least two is not connected exactly when its complementḠ is connected.
Remark 4.2. The transformation which switches every decoration 1 ↔ 0 in a cotree is of course an involution. Moreover, it turns independent sets into cliques in the corresponding cograph (indeed {v, v } is an edge in Cograph(t) if and only if the first common ancestor of the corresponding leaves of t has decoration 1). This proves that for every n, if G n denotes a uniform random cograph (either labeled or unlabeled) of size n, then where ω(G) is the maximum size of a clique in the graph G.

Proof of Theorem 1.3: Enumeration
Let L be the combinatorial family of labeled cotrees for which we forget decorations, counted by the number of leaves. Let L(z) denote the corresponding exponential generating function. The series L(z) = ∈L z | | /| |! is the unique formal power series solution of . Our goal is then to find the asymptotics of these coefficients. Note that if G is reduced to a single vertex • we have (G, I) = (•, ∅) or (•, {•}), therefore C(z, u) = z + zu + C 0 (z, u) + C 1 (z, u), (4.3) where C 0 (z, u) (resp C 1 (z, u)) is the bivariate series of the set C 0 (resp. C 1 ) of marked cographs (necessarily of size 2) for which the root of the associated cotree is decorated with a 0 (resp. a 1). We have L(z) = z + C 0 (z, 0) = z + C 1 (z, 0).
Indeed when the decoration of the root is fixed, the other decorations are then determined by the alternation condition. 1. A relation between the series C 0 (z, u) and C 1 (z, u) is given by

4)
2. and the series C 1 (z, u) is a solution of In the proof below, we make use of the notation exp k (x) := i k Proof. When a cotree T has its root r decorated by a 0, if we denote by (T i ) the subtrees rooted at the children of r, then the cograph G associated with T is the disjoint union of the cographs G i corresponding to the T i . An independent set of G is then the union of independent sets chosen in each of the G i . Recall also that by definition of cotrees, r has at least two children. Therefore the marked cographs for which the root of the associated cotree is decorated with a 0 can be described as a multiset of at least two elements chosen between (•, ∅), (•, {•}) and the elements of C 1 .
Using the symbolic method for labeled structures [FS09], we get the equation which is Equation (4.4).
When on the contrary a cotree T has its root r decorated by a 1, if we denote again by (T i ) the subtrees rooted at the children of r, then the cograph G associated with T is the join of the cographs G i corresponding to the T i . An independent set of G must then be an independent set chosen in one of the G i only (and the other children of r do not contribute to this independent set).
Let C ∅ denote the set of cographs without mark and whose cotree does not have a root decorated by a 1, i.e. C ∅ is the set consisting in (•, ∅) and the elements of C 0 marked with an empty independent set. Then, we distinguish two cases to describe the elements of C 1 (marked cographs for which the root of the associated cotree is decorated with a 1). Either they are marked with an empty independent set, and in this case they can be described as multisets of at least two elements of C ∅ . Or they are marked with a nonempty independent set, and they can be described as the pairs consisting of • a cograph which is either (•, {•}) or an element of C 0 marked with a nonempty independent set (for the graph G i containing the independent set); and • a multiset of at least one element of C ∅ (for the other graphs G i ).
From Equations (4.3) and (4.4) we have In the following, to get the asymptotics of the coefficients of C(z, u), we study C 1 (z, u) using Equation (4.5).

Proof of Theorem 1.3: main asymptotics -Proof of Eq. (1.1)
Following Flajolet and Sedgewick [FS09, p. 389], we say that a domain ∆ is a ∆-domain at ρ if there exist two real numbers R > ρ and 0 < φ < π 2 such that and that a power series is ∆-analytic if it is analytic in some ∆-domain at ρ, where ρ is its radius of convergence. From [FS09, Example VII.12 p.472] the series L(z) has radius of convergence ρ = 2 log(2) − 1 and is ∆-analytic. Moreover the following expansion holds in a ∆-domain at z = ρ: This allows to obtain the asymptotics of [z n ]C(z, 0). To get the one of [z n u k ] C(z, u), we turn to the study of C 1 (z, u).
Fix u ∈ C. The overall strategy is to perform saddle-point analysis with C 1 (z, u). To do so we rewrite Equation (4.5) as We will show that this almost fits the settings of the so-called smooth implicit-function schema (see [FS09,Sec. VII.4.1]), only the nonnegativity of the coefficients of G is not satisfied here. Nevertheless, we shall prove that sufficient conditions for the validity of [FS09, Thm. VII.3 p.468] are satisfied. First observe that for every u ∈ C the bivariate series (z, c) → G(z, c, u) is analytic for |z| < ρ and c ∈ C.

Solution of the characteristic system
We use the notational convention that, for any function H and variable t, H t denotes the partial derivative of H with respect to t. We consider the characteristic system G(r, s, u) = s, G c (r, s, u) = 1,  We aim at proving that, for any u > 0, (4.9) admits a unique solution (r, s) = (r(u), s(u)) with 0 < r < ρ and 0 < s. Below, we often use that the radius of convergence ρ of L(z) satisfies ρ = 2 log(2) − 1 and L(ρ) = log(2).
We conclude that for u > 0, the characteristic system (4.9) has a unique solution r(u), s(u) in [0, ρ] × C, and we have 0 < r(u) < ρ and s(u) > 0. In particular, r(u), s(u) belongs to the analyticity domain of G.

Locating the singularity of C 1 (z, u)
Fix u > 0. To obtain the singular behavior of C 1 (z, u) as in [FS09, Thm. VII.3 p.468] despite the negativity of some coefficients of G, we see from [FS09, Note VII.16 p.471] that it is enough to show the following: C 1 (z, u) has radius of convergence r(u) and its value at this singularity is given by C 1 (r(u), u) = s(u), i.e. the dominant singularity of C 1 (z, u) corresponds to the solution of the characteristic system.
The argument to prove this is an adaptation of that in the proof of [FS09, Thm. VII.3 p.468] to our setting where G has some negative coefficients but a larger analyticity region than what is usually assumed. Namely, our G is analytic on the whole domain {|z| < ρ, c ∈ C}, while the smooth implicit-function schema only assumes analyticity on {|z| < R, |c| < S} for some R, S > 0 (with the notation of [FS09, Sec. VII.4.1]). Let us denote temporarily ρ(u) the radius of convergence of C 1 (z, u), which is a singularity of C 1 (z, u) from Pringsheim's theorem.
We have reached a contradiction in both cases, proving that ρ(u) r(u). This allows us to consider C 1 (r(u), u) (which is possibly infinite), and we assume for the sake of contradiction that C 1 (r(u), u) = s(u). Then for a < r(u) sufficiently closed to r(u) the equation y = G(a, y, u) admits several solutions y ∈ C: • one is given by y = C 1 (a, u), • and two are obtained evaluating in a the two functions y 1 (z) and y 2 (z) given by the singular implicit function lemma [FS09, Lemma VII.3, p.469] applied to the point (r(u), s(u)).
(Note that the applicability of this lemma is guaranteed by the fact that (r(u), s(u)) is a solution of the characteristic system and Equations (4.14) and (4.24) below.) From [FS09, Lemma VII.3, p.469], it is clear that the last two solutions above are distinct for a close enough to r(u). The first one is also different from them for a close enough to r(u): indeed, for z tending to r(u), C 1 (z, u) tends to C 1 (r(u), u) while the two other solutions tend to s(u). However, the function y → G(a, y, u) is strictly convex (one checks easily that its second derivative is positive) and therefore cannot cross three times the main diagonal. We have reached a contradiction. We conclude that C 1 (r(u), u) = s(u).
It remains to prove ρ(u) = r(u). Since (r(u), s(u)) is a solution of the characteristic system, there is no analytic solution of the equation y = G(z, y, u) around the point (z, y) = (r(u), s(u)) (see the proof of [FS09, Lemma VII.3, p.469], where it is shown that any solution y has a series expansion involving a square-root term and hence cannot be analytic). Therefore C 1 (z, u) cannot be extended analytically to a neighborhood of r(u). So, ρ(u) = r(u), as wanted.

Derivatives of G: parametrized expressions and their signs
Several derivatives of G(z, c, u) appear in the computations below, to establish the asymptotic behavior of C(z, u) as well as estimates (i) and (ii) of Theorem 1.3. We collect useful properties of these derivatives here for convenience. In this paragraph, we also assume u > 0. Recall that First, from the explicit expression of G cc , it follows that G cc (r(u), s(u), u) > 0. (4.14) Moving on to G u ((r(u), s(u), u)), it will be convenient to parametrize the involved quantities by y := L (r(u)). Equation = (e y − 1) e y e y − 1 (2y + 1 − e y ) = e y (2y + 1 − e y ). (4.19) From the above and Equation (4.16), we have in particular Finally, we focus on G z (r(u), s(u), u). Using Equations (4.2) and (4.12), we start by observing that L (z) = 1 2 − e L(z) , 1 + s(u) + y = 2.

Obtaining the asymptotics
Recall that we established that C 1 (z, u) has radius of convergence r(u) and its value at this singularity is given by C 1 (r(u), u) = s(u). From [FS09,Sec. VII.4.1], we therefore obtain an estimate of C 1 (z, u) as z approaches r(u). Namely, for every u > 0 the series C 1 (z, u) has a square-root singularity at r(u) and in some ∆-domain, we have with γ 1 (u) = 2 r(u) Gz(r(u),s(u),u) Gcc(r(u),s(u),u) . Note that G cc (r(u), s(u), u) > 0 and G z (r(u), s(u), u) > 0 from Equations (4.14) and (4.24). The determination of the sign in front of 1 − z/r(u) uses that C 1 is increasing in z when z approaches r(u) from the left.
To obtain asymptotics for the coefficients of C 1 (z, u), we have to extend (4.25) for complex u around u > 0. We argue that the solutions (r, s) = (r(u), s(u)) of the characteristic system (4.9) have analytic continuations in a neighborhood of every u > 0. Observe that G is analytic and that the Jacobian matrix of the system is the following determinant (where all derivatives are evaluated at (r(u), s(u), u)) It is nonzero for u > 0 from Equations (4.14) and (4.24). Consequently, there exist analytic functions r(u), s(u) defined on a neighborhood of the positive real axis, such that, for each u, the pair (r(u), s(u)) is a solution of the characteristic system for such values of u. By continuity we can also ensure that, for u sufficiently close to the real axis, • r(u) is the unique singularity of C 1 (z, u) of smallest modulus and C 1 (r(u), u) = s(u); • G z and G cc are non-zero at (r(u), s(u), u).
We denote by U the open set of complex numbers u where these properties hold. Therefore, as stated in [Drm09, Remark 2.20], it follows that the singular representation (4.25) also holds for complex u ∈ U (and for z in a proper ∆-domain depending on u). Combining relation (4.6) with the above development (4.25) of C 1 (z, u) near z = r(u), we obtain for u ∈ U C(z, u) where γ(u) is defined by γ(u) = γ 1 (u) exp (s(u) + r(u)(1 + u)) .
Moreover since C(z, u) is aperiodic, r(u) is the unique dominant singularity of C and [z n ] C(z, u) n→+∞ = γ(u) 2 √ π n −3/2 (r(u)) −n (1 + O(1/n)) (4.26) uniformly for u in a compact subset contained in U (by Transfer Theorem [FS09, Thm.VI.3] and compactness). Now we can proceed as in [Drm94,Thm.3], with the nonnegativity of the coefficients of G replaced by the above variant of the smooth-implicit function schema, and obtain by an application of a saddle point integration [z n u k ]C(z, u) ∼ R k/n n 2 r(u(k/n))u(k/n) k/n −n , (4.27) uniformly for an k bn with 0 < a < b < 1, where R β (0 < β < 1) is some positive (computable) quantity and u = u(β) is determined by the following equation (which is the rewriting of [Drm94, (2.14)] with our notation): (4.28) We explain in Remark 4.4 below why Equation (4.28) is indeed invertible. Finally with Equation (4.8) we obtain uniformly for an k bn for some B β > 0 and with C β := 2 log(2) − 1 r(u(β))u(β) β . Remark 4.4. Let us justify that Equation (4.28) can be inverted to express u as a function of β. First, observe that Equation (4.18) defines u as a function of y. This function is decreasing for y ∈ (0, log(2)) and maps bijectively (0, log(2)) to (0, ∞). Therefore Equation (4.18) can be inverted to express y as a function of u, which is decreasing and maps bijectively (0, ∞) to (0, log(2)).

Proof of Theorem 1.3: Estimates (i) and (ii).
We now analyze the expression of C β . Combining Equation (4.29) with Equations (4.16), (4.18) and (4.30), we can express C β as an explicit function of y; further inverting numerically Equation (4.30) gives C β as a function of β. The graph of the function β → C β on Figure 1.1 was obtained in this way.
In particular, this proves C β > 1 for β ∈ (0, β 0 ) for some β 0 > 0. Numerical computations give the estimate β 0 ≈ 0.522677 . . . ; we furthermore observe numerically that C β reaches its maximum at β ≈ 0.229285 . . . where C β ≈ 1.3663055 . . . . Details on the computations above are provided in a jupyter notebook embedded into this pdf (alternatively you can download the source of the arXiv version to get the files). We provide both an html read-only version and an editable ipynb version for the reader's convenience.

Expected number of increasing subsequences of linear size
We now discuss the proof of Theorem 1.8, the analog of Theorem 1.3 for separable permutations. We start with some definitions.
Direct sums and skew sums readily extend to more than two permutations, writing ⊕[π, . . . , τ, ρ] = ⊕[π, · · · ⊕ [τ, ρ]] (and similarly for ). As mentioned in Section 1.2, separable permutations are those which can be obtained from permutations of size 1 performing direct sums and skew sums. This is similar to the characterization of cographs as the graphs obtained using the join and disjoint union constructions, from graphs with one vertex. And similarly to the description of cographs through their cotrees, this allows to associate a tree with each separable permutation. (This is actually a special case of the construction which associate with each permutation, not necessarily separable, its substitution decomposition tree -see e.g. [BBF+22a, Section 1.1]).
There are actually several presentations of this correspondence between separable permutations and trees. The one which is suitable here is presented in [BBF+18, Section 2.2], and we borrow our terminology from there.
Definition 5.1. A signed Schröder tree where the signs alternate of size n is a rooted tree t with n leaves such that: • t is plane (i.e. the children of every internal node are ordered); • every internal node has at least two children; • every internal node in t is decorated with ⊕ or ; • decorations ⊕ and should alternate along each branch from the root to a leaf.
An important difference with cotrees is that the above trees are plane, while cotrees are not plane.
We can associate to a signed Schröder tree where the signs alternate a permutation perm(t) of the same size, as follows.
• If t consists of a single leaf, then perm(t) is the permutation of size 1.
• Otherwise, the root of t has decoration ⊕ or and has subtrees t 1 , . . . , t d attached to it (d 2), in this order from left to right. Then, if the root has decoration ⊕, we let perm(t) be ⊕[perm(t 1 ) . . . , perm(t d )]. Otherwise, the root has decoration , and we let perm(t) be [perm(t 1 ), . . . , perm(t d )].
Proposition 5.2. The correspondence presented above between separable permutations and signed Schröder trees where the signs alternate is one-to-one.
For a proof of this statement, we refer to [BBF+18, Proposition 2.13] -see also the references given in [BBF+18].
We can now move to the proof of Theorem 1.8. The strategy is the same as in the proof of Theorem 1.3, using the encoding of separable permutations by their signed Schröder trees where the signs alternate, instead of the encoding of cographs by their cotrees. We therefore only sketch the computations here. Details are provided in the attached jupyter notebook.
Indeed, an increasing subsequence in a direct sum of permutations π 1 ⊕ · · · ⊕ π r is a union of increasing subsequences in π 1 , . . . , and π r . Hence elements counted by S ⊕ can be described as sequences of at least two elements chosen between (•, ∅), (•, {•}) and the elements counted by S . This leads to Eq. (5.3). On the other hand, a nonempty increasing subsequence in a skew sum of permutations π 1 · · · π r is an increasing subsequence in either π 1 , . . . , or π r Therefore, elements of S marked with a nonempty increasing subsequence correspond to sequences of at least two elements, with exactly one element counted by zu + S ⊕ − S ⊕ (z, 0) (either (•, {•}) or a ⊕-decomposable permutation with a nonempty marked increasing subsequence) and other elements counted by S. We need to add a term S (z, 0) = S 2 1−S for the case of an empty marked increasing subsequence. Substituting Eq. (5.3) and using S ⊕ (z, 0) = S(z) − z gives Eq. (5.2).
Fix u ∈ C. In order to perform saddle-point analysis with S , we rewrite the first equation of the previous system as S = G(z, S , u) where G(z, c, u) = S 2 (z) 1 − S(z) + (c + z + zu) 2 1 − (c + z + zu) + zu + z − S(z) × 1 (1 − S(z)) 2 − 1 (5.4) Again this almost fits the settings of the smooth implicit-function schema, only the nonnegativity of the coefficients of G is not verified here. And, as we shall see, sufficient conditions for the validity of [FS09, Thm. VII.3 p.468] similar to the cograph case are satisfied.