inner join 下 order by 排序不走索引

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

这是一个创建于 2133 天前的主题，其中的信息可能已经有所发展或是发生改变。

sql 如下:

select i.* from (select * from illusts order by artist_id,type,create_date)  i  join (select artist_id from user_artist_followed where user_id=53)u using(artist_id) where type='illust' order by create_date desc ;

illust 表下建了个索引(artist_id,type,create_date),user_artist_followed 表建了索引(user_id,artist_id)

查询计划显示似乎先进行了 filesort 之后才走了索引

想请问一下,是否有优化方法,消除 filesort 和临时表

create table

-- auto-generated definition
create table illusts
(
    illust_id       bigint auto_increment comment '主键 id'
        primary key,
    title           varchar(1023)                         not null comment '标题',
    type            varchar(20)                           not null comment '1:ugoira、2:manga、3：illust',
    caption         text      default ''                  not null comment '附言',
    `restrict`      tinyint                               not null comment '限制',
    artist          text                                  not null comment '画师 Json',
    tools           varchar(255)                          not null comment '作画作画工具',
    tags            text                                  null comment '标签 json',
    create_date     datetime                              not null comment '创建时间',
    page_count      int       default 0                   not null comment '页数',
    width           int                                   not null comment '宽度',
    height          int                                   not null comment '高度',
    sanity_level    tinyint                               not null comment '情色级别',
    x_restrict      tinyint                               not null comment '十八禁限制',
    total_bookmarks int                                   not null comment '收藏数',
    total_view      int                                   not null comment '查看数',
    image_urls      longtext  default ''                  not null comment '图片链接',
    artist_id       int                                   not null comment '画师 id',
    update_time     timestamp default current_timestamp() not null on update current_timestamp() comment '更新时间'
)
    comment '插画表' charset = utf8mb4;

create index artist_id_type_create_date_index
    on illusts (artist_id, type, create_date);

create index update_time_total_bookmarks_index
    on illusts (update_time, total_bookmarks);

-- auto-generated definition
create table user_artist_followed
(
    id          int auto_increment
        primary key,
    user_id     int      null comment '用户 id',
    artist_id   bigint   null comment '画师 id',
    create_date datetime null
);

create index user_id_artist_id_create_date
    on user_artist_followed (user_id, artist_id, create_date);

第 1 条附言 · 2020-02-10 11:44:38 +08:00

需求是查看关注画师最新画作,即获取关注画师,用画师 id 获取这些画师所有画作,用 type 筛选后用 create_date 排序(用 id 排序也行)

第 2 条附言 · 2020-02-10 11:47:41 +08:00

v友们一直说子查询,,,

select * from illusts  where artist_id in (select artist_id from user_artist_followed where user_id=53) and  type='illust' order by create_date desc  limit 0, 30

第 3 条附言 · 2020-02-10 14:19:05 +08:00

sql编辑错了,真正为


select i.* from (select * from illusts)  i  join (select artist_id from user_artist_followed where user_id=53)u using(artist_id) where type='illust' order by create_date desc ;

artist_id

索引

select

type

30 条回复 • 2020-02-10 14:38:24 +08:00

GGGG430

2020-02-07 12:09:08 +08:00

show create tables;

OysterQAQ

2020-02-07 12:11:45 +08:00

@GGGG430 已经添加

GGGG430

2020-02-07 12:30:15 +08:00

create index type_create_date_index
on illusts ( type, create_date);
illusts 表上的 where 条件中 artist_id 用了表达式没走联合索引, 重新建一个

GGGG430

2020-02-07 12:34:43 +08:00

explain select artist_id from user_artist_followed where user_id = 53;
explain select * from illusts where artist_id in (?, ?) and type = 'illust' order by artist_id, type, create_date;
感觉拆分一下子查询好点

OysterQAQ

2020-02-07 12:41:29 +08:00

@GGGG430 这个子查询不等价只需要根据 create_date 排序,而且 artist_id 可以上千的,illusts 表有两千万;如果说是 artist_id 因为是表达式不走索引的话,但是我试了下把 order by 去掉之后走了索引,如果是因为表达式 artist_id 不走索引,那么这时候是只走了 type 这个索引吗

wysnylc

2020-02-07 13:04:51 +08:00 via Android

子查询优化掉改成多次查询 hash 拼接

GGGG430

2020-02-07 13:06:24 +08:00

因为 artist_id 过多导致子查询不方便的话, 你就增加一下(type, create_date)这个索引吧, explain 如下
1 SIMPLE illusts ref artist_id_type_create_date_index,type_create_date_index type_create_date_index 82 const 1 100 Using index condition
1 SIMPLE user_artist_followed ref user_id_artist_id_create_date,idx_artist_id user_id_artist_id_create_date 14 const,func 1 100 Using where; Using index

另外我觉得你第一个子查询中的 order by artist_id,type,create_date 很奇怪, 能否和最后的 order by 合并呢

OysterQAQ

2020-02-07 13:25:04 +08:00

@GGGG430 增加这个索引是可行的,在测试表加了下,走了索引,晚上在正式操作一下,感谢
@wysnylc 具体怎么操作?

OysterQAQ

2020-02-10 11:06:27 +08:00

@GGGG430 发现如果换成那个索引,那么两表连接速度就下来了

GGGG430

2020-02-10 11:18:08 +08:00

@OysterQAQ 是耗时增加了的意思吗

des

2020-02-10 11:26:32 +08:00 via Android

首先发一下 mysql 的版本吧，另外子查询里面的 order by artist_id,type,create_date 比较奇怪，某个版本以上这样的 order by 会被直接优化掉，子查询有 order by 为什么外面还有一个？

illusts 和 user_artist_followed 的数据量都发一下？

还有 where 为啥不直接写在子查询里，另外第一个子查询感觉可以直接优化掉，另外不是 inner join ？

第二个子查询感觉可以分两步查

des

2020-02-10 11:27:44 +08:00 via Android

@des 好吧，数据量都有了，才不到两百

sagaxu

2020-02-10 11:38:05 +08:00 via Android

排序用索引的前提是，查询条件用到了索引 a,b,c，且条件是 a=xx and b=yy，排序是 order by c。索引是针对原始表的，子查询或者中间表是没有索引的，新版本有 hash index，部分情况下能加速连表。

OysterQAQ

2020-02-10 11:39:57 +08:00

@GGGG430 是的感觉是用了(type,create_date)之后两表链接出了问题

OysterQAQ

2020-02-10 11:42:19 +08:00

@des mariadb10.4;order by 只有一个;illusts 两千万,user_artist_followed 不好说一直在增加,最多一个用户估计有 1000 个关注画师;join=inner join,子查询效率更差

OysterQAQ

2020-02-10 11:43:17 +08:00

@sagaxu 目前现状看,应该是连接过程没用索引,还是没能解决问题

kifile

2020-02-10 11:44:34 +08:00

SELECT * FROM illusts
WHERE artist_id in (
SELECT artist_id FROM user_artitist_followed
WHERE user_id=53
) AND type='illust'
ORDER BY create_date DESC

看了一下，你的语句可以等价成这个样子，没必要吧 illusts select *

kifile

2020-02-10 11:45:11 +08:00

其实索引都建了，用子查询会比 join 更优一些

OysterQAQ

2020-02-10 11:52:50 +08:00

@kifile 一切用执行计划来说明,这里两个没有绝对的说法

OysterQAQ

2020-02-10 11:54:05 +08:00

感觉数据量区别还是会大程度影响执行计划

kifile

2020-02-10 12:49:08 +08:00

恩，刚才的语句，实际 explain 了一下，由于子查询被认为是关联查询，所以其实并没能成功命中 artist_type 的索引，分阶段执行，先获取 artist_id ,再和 type 联合查询就都命中索引了

kifile

2020-02-10 12:52:49 +08:00

set @t=(select group_concat(artist_id) FROM user_artist_followed where user_id=53);
explain select * from illusts where type='illust' and artist_id in (@t);

这样写，中间态没有数据传输，同时也可以命中索引

OysterQAQ

2020-02-10 13:28:45 +08:00

@kifile 感谢,确实可行,但是没有测试上千数据,不知道这样做最终会在多大数据量发生性能下跌

kifile

2020-02-10 13:36:31 +08:00

之后的优化其实就是基于业务场景了，之前看过微博他们好像是大 V 用户的内容信息和非 V 用户的内容信息分开存储，代码逻辑处理两块的内容合并，以优化业务场景

xzc19970719

2020-02-10 14:02:43 +08:00

order by 无过滤不索引
select * from illusts order by artist_id,type,create_date 你这句还是走的 Using filesort

OysterQAQ

2020-02-10 14:08:57 +08:00

@xzc19970719 有认真看问题吗,,,

OysterQAQ

2020-02-10 14:18:06 +08:00

@xzc19970719 抱歉是我看错了

xzc19970719

2020-02-10 14:25:08 +08:00

@OysterQAQ。。我也是个初学菜鸡。。只想到了 17 楼的办法但是 mysql5.5 以下 in 不走索引 create_date DESC 的话应该建倒序索引

OysterQAQ

2020-02-10 14:33:23 +08:00

@kifile 老歌这样查貌似是用 artist_id 先排序了

OysterQAQ

2020-02-10 14:38:24 +08:00

好像我的需求没办法用索引来排序