You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

dataset_retrieval.py 52KB

Introduce Plugins (#13836) Signed-off-by: yihong0618 <zouzou0208@gmail.com> Signed-off-by: -LAN- <laipz8200@outlook.com> Signed-off-by: xhe <xw897002528@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: takatost <takatost@gmail.com> Co-authored-by: kurokobo <kuro664@gmail.com> Co-authored-by: Novice Lee <novicelee@NoviPro.local> Co-authored-by: zxhlyh <jasonapring2015@outlook.com> Co-authored-by: AkaraChen <akarachen@outlook.com> Co-authored-by: Yi <yxiaoisme@gmail.com> Co-authored-by: Joel <iamjoel007@gmail.com> Co-authored-by: JzoNg <jzongcode@gmail.com> Co-authored-by: twwu <twwu@dify.ai> Co-authored-by: Hiroshi Fujita <fujita-h@users.noreply.github.com> Co-authored-by: AkaraChen <85140972+AkaraChen@users.noreply.github.com> Co-authored-by: NFish <douxc512@gmail.com> Co-authored-by: Wu Tianwei <30284043+WTW0313@users.noreply.github.com> Co-authored-by: 非法操作 <hjlarry@163.com> Co-authored-by: Novice <857526207@qq.com> Co-authored-by: Hiroki Nagai <82458324+nagaihiroki-git@users.noreply.github.com> Co-authored-by: Gen Sato <52241300+halogen22@users.noreply.github.com> Co-authored-by: eux <euxuuu@gmail.com> Co-authored-by: huangzhuo1949 <167434202+huangzhuo1949@users.noreply.github.com> Co-authored-by: huangzhuo <huangzhuo1@xiaomi.com> Co-authored-by: lotsik <lotsik@mail.ru> Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com> Co-authored-by: nite-knite <nkCoding@gmail.com> Co-authored-by: Jyong <76649700+JohnJyong@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: gakkiyomi <gakkiyomi@aliyun.com> Co-authored-by: CN-P5 <heibai2006@gmail.com> Co-authored-by: CN-P5 <heibai2006@qq.com> Co-authored-by: Chuehnone <1897025+chuehnone@users.noreply.github.com> Co-authored-by: yihong <zouzou0208@gmail.com> Co-authored-by: Kevin9703 <51311316+Kevin9703@users.noreply.github.com> Co-authored-by: -LAN- <laipz8200@outlook.com> Co-authored-by: Boris Feld <lothiraldan@gmail.com> Co-authored-by: mbo <himabo@gmail.com> Co-authored-by: mabo <mabo@aeyes.ai> Co-authored-by: Warren Chen <warren.chen830@gmail.com> Co-authored-by: JzoNgKVO <27049666+JzoNgKVO@users.noreply.github.com> Co-authored-by: jiandanfeng <chenjh3@wangsu.com> Co-authored-by: zhu-an <70234959+xhdd123321@users.noreply.github.com> Co-authored-by: zhaoqingyu.1075 <zhaoqingyu.1075@bytedance.com> Co-authored-by: 海狸大師 <86974027+yenslife@users.noreply.github.com> Co-authored-by: Xu Song <xusong.vip@gmail.com> Co-authored-by: rayshaw001 <396301947@163.com> Co-authored-by: Ding Jiatong <dingjiatong@gmail.com> Co-authored-by: Bowen Liang <liangbowen@gf.com.cn> Co-authored-by: JasonVV <jasonwangiii@outlook.com> Co-authored-by: le0zh <newlight@qq.com> Co-authored-by: zhuxinliang <zhuxinliang@didiglobal.com> Co-authored-by: k-zaku <zaku99@outlook.jp> Co-authored-by: luckylhb90 <luckylhb90@gmail.com> Co-authored-by: hobo.l <hobo.l@binance.com> Co-authored-by: jiangbo721 <365065261@qq.com> Co-authored-by: 刘江波 <jiangbo721@163.com> Co-authored-by: Shun Miyazawa <34241526+miya@users.noreply.github.com> Co-authored-by: EricPan <30651140+Egfly@users.noreply.github.com> Co-authored-by: crazywoola <427733928@qq.com> Co-authored-by: sino <sino2322@gmail.com> Co-authored-by: Jhvcc <37662342+Jhvcc@users.noreply.github.com> Co-authored-by: lowell <lowell.hu@zkteco.in> Co-authored-by: Boris Polonsky <BorisPolonsky@users.noreply.github.com> Co-authored-by: Ademílson Tonato <ademilsonft@outlook.com> Co-authored-by: Ademílson Tonato <ademilson.tonato@refurbed.com> Co-authored-by: IWAI, Masaharu <iwaim.sub@gmail.com> Co-authored-by: Yueh-Po Peng (Yabi) <94939112+y10ab1@users.noreply.github.com> Co-authored-by: Jason <ggbbddjm@gmail.com> Co-authored-by: Xin Zhang <sjhpzx@gmail.com> Co-authored-by: yjc980121 <3898524+yjc980121@users.noreply.github.com> Co-authored-by: heyszt <36215648+hieheihei@users.noreply.github.com> Co-authored-by: Abdullah AlOsaimi <osaimiacc@gmail.com> Co-authored-by: Abdullah AlOsaimi <189027247+osaimi@users.noreply.github.com> Co-authored-by: Yingchun Lai <laiyingchun@apache.org> Co-authored-by: Hash Brown <hi@xzd.me> Co-authored-by: zuodongxu <192560071+zuodongxu@users.noreply.github.com> Co-authored-by: Masashi Tomooka <tmokmss@users.noreply.github.com> Co-authored-by: aplio <ryo.091219@gmail.com> Co-authored-by: Obada Khalili <54270856+obadakhalili@users.noreply.github.com> Co-authored-by: Nam Vu <zuzoovn@gmail.com> Co-authored-by: Kei YAMAZAKI <1715090+kei-yamazaki@users.noreply.github.com> Co-authored-by: TechnoHouse <13776377+deephbz@users.noreply.github.com> Co-authored-by: Riddhimaan-Senapati <114703025+Riddhimaan-Senapati@users.noreply.github.com> Co-authored-by: MaFee921 <31881301+2284730142@users.noreply.github.com> Co-authored-by: te-chan <t-nakanome@sakura-is.co.jp> Co-authored-by: HQidea <HQidea@users.noreply.github.com> Co-authored-by: Joshbly <36315710+Joshbly@users.noreply.github.com> Co-authored-by: xhe <xw897002528@gmail.com> Co-authored-by: weiwenyan-dev <154779315+weiwenyan-dev@users.noreply.github.com> Co-authored-by: ex_wenyan.wei <ex_wenyan.wei@tcl.com> Co-authored-by: engchina <12236799+engchina@users.noreply.github.com> Co-authored-by: engchina <atjapan2015@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: 呆萌闷油瓶 <253605712@qq.com> Co-authored-by: Kemal <kemalmeler@outlook.com> Co-authored-by: Lazy_Frog <4590648+lazyFrogLOL@users.noreply.github.com> Co-authored-by: Yi Xiao <54782454+YIXIAO0@users.noreply.github.com> Co-authored-by: Steven sun <98230804+Tuyohai@users.noreply.github.com> Co-authored-by: steven <sunzwj@digitalchina.com> Co-authored-by: Kalo Chin <91766386+fdb02983rhy@users.noreply.github.com> Co-authored-by: Katy Tao <34019945+KatyTao@users.noreply.github.com> Co-authored-by: depy <42985524+h4ckdepy@users.noreply.github.com> Co-authored-by: 胡春东 <gycm520@gmail.com> Co-authored-by: Junjie.M <118170653@qq.com> Co-authored-by: MuYu <mr.muzea@gmail.com> Co-authored-by: Naoki Takashima <39912547+takatea@users.noreply.github.com> Co-authored-by: Summer-Gu <37869445+gubinjie@users.noreply.github.com> Co-authored-by: Fei He <droxer.he@gmail.com> Co-authored-by: ybalbert001 <120714773+ybalbert001@users.noreply.github.com> Co-authored-by: Yuanbo Li <ybalbert@amazon.com> Co-authored-by: douxc <7553076+douxc@users.noreply.github.com> Co-authored-by: liuzhenghua <1090179900@qq.com> Co-authored-by: Wu Jiayang <62842862+Wu-Jiayang@users.noreply.github.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: kimjion <45935338+kimjion@users.noreply.github.com> Co-authored-by: AugNSo <song.tiankai@icloud.com> Co-authored-by: llinvokerl <38915183+llinvokerl@users.noreply.github.com> Co-authored-by: liusurong.lsr <liusurong.lsr@alibaba-inc.com> Co-authored-by: Vasu Negi <vasu-negi@users.noreply.github.com> Co-authored-by: Hundredwz <1808096180@qq.com> Co-authored-by: Xiyuan Chen <52963600+GareArc@users.noreply.github.com>
8 months ago
Introduce Plugins (#13836) Signed-off-by: yihong0618 <zouzou0208@gmail.com> Signed-off-by: -LAN- <laipz8200@outlook.com> Signed-off-by: xhe <xw897002528@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: takatost <takatost@gmail.com> Co-authored-by: kurokobo <kuro664@gmail.com> Co-authored-by: Novice Lee <novicelee@NoviPro.local> Co-authored-by: zxhlyh <jasonapring2015@outlook.com> Co-authored-by: AkaraChen <akarachen@outlook.com> Co-authored-by: Yi <yxiaoisme@gmail.com> Co-authored-by: Joel <iamjoel007@gmail.com> Co-authored-by: JzoNg <jzongcode@gmail.com> Co-authored-by: twwu <twwu@dify.ai> Co-authored-by: Hiroshi Fujita <fujita-h@users.noreply.github.com> Co-authored-by: AkaraChen <85140972+AkaraChen@users.noreply.github.com> Co-authored-by: NFish <douxc512@gmail.com> Co-authored-by: Wu Tianwei <30284043+WTW0313@users.noreply.github.com> Co-authored-by: 非法操作 <hjlarry@163.com> Co-authored-by: Novice <857526207@qq.com> Co-authored-by: Hiroki Nagai <82458324+nagaihiroki-git@users.noreply.github.com> Co-authored-by: Gen Sato <52241300+halogen22@users.noreply.github.com> Co-authored-by: eux <euxuuu@gmail.com> Co-authored-by: huangzhuo1949 <167434202+huangzhuo1949@users.noreply.github.com> Co-authored-by: huangzhuo <huangzhuo1@xiaomi.com> Co-authored-by: lotsik <lotsik@mail.ru> Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com> Co-authored-by: nite-knite <nkCoding@gmail.com> Co-authored-by: Jyong <76649700+JohnJyong@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: gakkiyomi <gakkiyomi@aliyun.com> Co-authored-by: CN-P5 <heibai2006@gmail.com> Co-authored-by: CN-P5 <heibai2006@qq.com> Co-authored-by: Chuehnone <1897025+chuehnone@users.noreply.github.com> Co-authored-by: yihong <zouzou0208@gmail.com> Co-authored-by: Kevin9703 <51311316+Kevin9703@users.noreply.github.com> Co-authored-by: -LAN- <laipz8200@outlook.com> Co-authored-by: Boris Feld <lothiraldan@gmail.com> Co-authored-by: mbo <himabo@gmail.com> Co-authored-by: mabo <mabo@aeyes.ai> Co-authored-by: Warren Chen <warren.chen830@gmail.com> Co-authored-by: JzoNgKVO <27049666+JzoNgKVO@users.noreply.github.com> Co-authored-by: jiandanfeng <chenjh3@wangsu.com> Co-authored-by: zhu-an <70234959+xhdd123321@users.noreply.github.com> Co-authored-by: zhaoqingyu.1075 <zhaoqingyu.1075@bytedance.com> Co-authored-by: 海狸大師 <86974027+yenslife@users.noreply.github.com> Co-authored-by: Xu Song <xusong.vip@gmail.com> Co-authored-by: rayshaw001 <396301947@163.com> Co-authored-by: Ding Jiatong <dingjiatong@gmail.com> Co-authored-by: Bowen Liang <liangbowen@gf.com.cn> Co-authored-by: JasonVV <jasonwangiii@outlook.com> Co-authored-by: le0zh <newlight@qq.com> Co-authored-by: zhuxinliang <zhuxinliang@didiglobal.com> Co-authored-by: k-zaku <zaku99@outlook.jp> Co-authored-by: luckylhb90 <luckylhb90@gmail.com> Co-authored-by: hobo.l <hobo.l@binance.com> Co-authored-by: jiangbo721 <365065261@qq.com> Co-authored-by: 刘江波 <jiangbo721@163.com> Co-authored-by: Shun Miyazawa <34241526+miya@users.noreply.github.com> Co-authored-by: EricPan <30651140+Egfly@users.noreply.github.com> Co-authored-by: crazywoola <427733928@qq.com> Co-authored-by: sino <sino2322@gmail.com> Co-authored-by: Jhvcc <37662342+Jhvcc@users.noreply.github.com> Co-authored-by: lowell <lowell.hu@zkteco.in> Co-authored-by: Boris Polonsky <BorisPolonsky@users.noreply.github.com> Co-authored-by: Ademílson Tonato <ademilsonft@outlook.com> Co-authored-by: Ademílson Tonato <ademilson.tonato@refurbed.com> Co-authored-by: IWAI, Masaharu <iwaim.sub@gmail.com> Co-authored-by: Yueh-Po Peng (Yabi) <94939112+y10ab1@users.noreply.github.com> Co-authored-by: Jason <ggbbddjm@gmail.com> Co-authored-by: Xin Zhang <sjhpzx@gmail.com> Co-authored-by: yjc980121 <3898524+yjc980121@users.noreply.github.com> Co-authored-by: heyszt <36215648+hieheihei@users.noreply.github.com> Co-authored-by: Abdullah AlOsaimi <osaimiacc@gmail.com> Co-authored-by: Abdullah AlOsaimi <189027247+osaimi@users.noreply.github.com> Co-authored-by: Yingchun Lai <laiyingchun@apache.org> Co-authored-by: Hash Brown <hi@xzd.me> Co-authored-by: zuodongxu <192560071+zuodongxu@users.noreply.github.com> Co-authored-by: Masashi Tomooka <tmokmss@users.noreply.github.com> Co-authored-by: aplio <ryo.091219@gmail.com> Co-authored-by: Obada Khalili <54270856+obadakhalili@users.noreply.github.com> Co-authored-by: Nam Vu <zuzoovn@gmail.com> Co-authored-by: Kei YAMAZAKI <1715090+kei-yamazaki@users.noreply.github.com> Co-authored-by: TechnoHouse <13776377+deephbz@users.noreply.github.com> Co-authored-by: Riddhimaan-Senapati <114703025+Riddhimaan-Senapati@users.noreply.github.com> Co-authored-by: MaFee921 <31881301+2284730142@users.noreply.github.com> Co-authored-by: te-chan <t-nakanome@sakura-is.co.jp> Co-authored-by: HQidea <HQidea@users.noreply.github.com> Co-authored-by: Joshbly <36315710+Joshbly@users.noreply.github.com> Co-authored-by: xhe <xw897002528@gmail.com> Co-authored-by: weiwenyan-dev <154779315+weiwenyan-dev@users.noreply.github.com> Co-authored-by: ex_wenyan.wei <ex_wenyan.wei@tcl.com> Co-authored-by: engchina <12236799+engchina@users.noreply.github.com> Co-authored-by: engchina <atjapan2015@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: 呆萌闷油瓶 <253605712@qq.com> Co-authored-by: Kemal <kemalmeler@outlook.com> Co-authored-by: Lazy_Frog <4590648+lazyFrogLOL@users.noreply.github.com> Co-authored-by: Yi Xiao <54782454+YIXIAO0@users.noreply.github.com> Co-authored-by: Steven sun <98230804+Tuyohai@users.noreply.github.com> Co-authored-by: steven <sunzwj@digitalchina.com> Co-authored-by: Kalo Chin <91766386+fdb02983rhy@users.noreply.github.com> Co-authored-by: Katy Tao <34019945+KatyTao@users.noreply.github.com> Co-authored-by: depy <42985524+h4ckdepy@users.noreply.github.com> Co-authored-by: 胡春东 <gycm520@gmail.com> Co-authored-by: Junjie.M <118170653@qq.com> Co-authored-by: MuYu <mr.muzea@gmail.com> Co-authored-by: Naoki Takashima <39912547+takatea@users.noreply.github.com> Co-authored-by: Summer-Gu <37869445+gubinjie@users.noreply.github.com> Co-authored-by: Fei He <droxer.he@gmail.com> Co-authored-by: ybalbert001 <120714773+ybalbert001@users.noreply.github.com> Co-authored-by: Yuanbo Li <ybalbert@amazon.com> Co-authored-by: douxc <7553076+douxc@users.noreply.github.com> Co-authored-by: liuzhenghua <1090179900@qq.com> Co-authored-by: Wu Jiayang <62842862+Wu-Jiayang@users.noreply.github.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: kimjion <45935338+kimjion@users.noreply.github.com> Co-authored-by: AugNSo <song.tiankai@icloud.com> Co-authored-by: llinvokerl <38915183+llinvokerl@users.noreply.github.com> Co-authored-by: liusurong.lsr <liusurong.lsr@alibaba-inc.com> Co-authored-by: Vasu Negi <vasu-negi@users.noreply.github.com> Co-authored-by: Hundredwz <1808096180@qq.com> Co-authored-by: Xiyuan Chen <52963600+GareArc@users.noreply.github.com>
8 months ago
Introduce Plugins (#13836) Signed-off-by: yihong0618 <zouzou0208@gmail.com> Signed-off-by: -LAN- <laipz8200@outlook.com> Signed-off-by: xhe <xw897002528@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: takatost <takatost@gmail.com> Co-authored-by: kurokobo <kuro664@gmail.com> Co-authored-by: Novice Lee <novicelee@NoviPro.local> Co-authored-by: zxhlyh <jasonapring2015@outlook.com> Co-authored-by: AkaraChen <akarachen@outlook.com> Co-authored-by: Yi <yxiaoisme@gmail.com> Co-authored-by: Joel <iamjoel007@gmail.com> Co-authored-by: JzoNg <jzongcode@gmail.com> Co-authored-by: twwu <twwu@dify.ai> Co-authored-by: Hiroshi Fujita <fujita-h@users.noreply.github.com> Co-authored-by: AkaraChen <85140972+AkaraChen@users.noreply.github.com> Co-authored-by: NFish <douxc512@gmail.com> Co-authored-by: Wu Tianwei <30284043+WTW0313@users.noreply.github.com> Co-authored-by: 非法操作 <hjlarry@163.com> Co-authored-by: Novice <857526207@qq.com> Co-authored-by: Hiroki Nagai <82458324+nagaihiroki-git@users.noreply.github.com> Co-authored-by: Gen Sato <52241300+halogen22@users.noreply.github.com> Co-authored-by: eux <euxuuu@gmail.com> Co-authored-by: huangzhuo1949 <167434202+huangzhuo1949@users.noreply.github.com> Co-authored-by: huangzhuo <huangzhuo1@xiaomi.com> Co-authored-by: lotsik <lotsik@mail.ru> Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com> Co-authored-by: nite-knite <nkCoding@gmail.com> Co-authored-by: Jyong <76649700+JohnJyong@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: gakkiyomi <gakkiyomi@aliyun.com> Co-authored-by: CN-P5 <heibai2006@gmail.com> Co-authored-by: CN-P5 <heibai2006@qq.com> Co-authored-by: Chuehnone <1897025+chuehnone@users.noreply.github.com> Co-authored-by: yihong <zouzou0208@gmail.com> Co-authored-by: Kevin9703 <51311316+Kevin9703@users.noreply.github.com> Co-authored-by: -LAN- <laipz8200@outlook.com> Co-authored-by: Boris Feld <lothiraldan@gmail.com> Co-authored-by: mbo <himabo@gmail.com> Co-authored-by: mabo <mabo@aeyes.ai> Co-authored-by: Warren Chen <warren.chen830@gmail.com> Co-authored-by: JzoNgKVO <27049666+JzoNgKVO@users.noreply.github.com> Co-authored-by: jiandanfeng <chenjh3@wangsu.com> Co-authored-by: zhu-an <70234959+xhdd123321@users.noreply.github.com> Co-authored-by: zhaoqingyu.1075 <zhaoqingyu.1075@bytedance.com> Co-authored-by: 海狸大師 <86974027+yenslife@users.noreply.github.com> Co-authored-by: Xu Song <xusong.vip@gmail.com> Co-authored-by: rayshaw001 <396301947@163.com> Co-authored-by: Ding Jiatong <dingjiatong@gmail.com> Co-authored-by: Bowen Liang <liangbowen@gf.com.cn> Co-authored-by: JasonVV <jasonwangiii@outlook.com> Co-authored-by: le0zh <newlight@qq.com> Co-authored-by: zhuxinliang <zhuxinliang@didiglobal.com> Co-authored-by: k-zaku <zaku99@outlook.jp> Co-authored-by: luckylhb90 <luckylhb90@gmail.com> Co-authored-by: hobo.l <hobo.l@binance.com> Co-authored-by: jiangbo721 <365065261@qq.com> Co-authored-by: 刘江波 <jiangbo721@163.com> Co-authored-by: Shun Miyazawa <34241526+miya@users.noreply.github.com> Co-authored-by: EricPan <30651140+Egfly@users.noreply.github.com> Co-authored-by: crazywoola <427733928@qq.com> Co-authored-by: sino <sino2322@gmail.com> Co-authored-by: Jhvcc <37662342+Jhvcc@users.noreply.github.com> Co-authored-by: lowell <lowell.hu@zkteco.in> Co-authored-by: Boris Polonsky <BorisPolonsky@users.noreply.github.com> Co-authored-by: Ademílson Tonato <ademilsonft@outlook.com> Co-authored-by: Ademílson Tonato <ademilson.tonato@refurbed.com> Co-authored-by: IWAI, Masaharu <iwaim.sub@gmail.com> Co-authored-by: Yueh-Po Peng (Yabi) <94939112+y10ab1@users.noreply.github.com> Co-authored-by: Jason <ggbbddjm@gmail.com> Co-authored-by: Xin Zhang <sjhpzx@gmail.com> Co-authored-by: yjc980121 <3898524+yjc980121@users.noreply.github.com> Co-authored-by: heyszt <36215648+hieheihei@users.noreply.github.com> Co-authored-by: Abdullah AlOsaimi <osaimiacc@gmail.com> Co-authored-by: Abdullah AlOsaimi <189027247+osaimi@users.noreply.github.com> Co-authored-by: Yingchun Lai <laiyingchun@apache.org> Co-authored-by: Hash Brown <hi@xzd.me> Co-authored-by: zuodongxu <192560071+zuodongxu@users.noreply.github.com> Co-authored-by: Masashi Tomooka <tmokmss@users.noreply.github.com> Co-authored-by: aplio <ryo.091219@gmail.com> Co-authored-by: Obada Khalili <54270856+obadakhalili@users.noreply.github.com> Co-authored-by: Nam Vu <zuzoovn@gmail.com> Co-authored-by: Kei YAMAZAKI <1715090+kei-yamazaki@users.noreply.github.com> Co-authored-by: TechnoHouse <13776377+deephbz@users.noreply.github.com> Co-authored-by: Riddhimaan-Senapati <114703025+Riddhimaan-Senapati@users.noreply.github.com> Co-authored-by: MaFee921 <31881301+2284730142@users.noreply.github.com> Co-authored-by: te-chan <t-nakanome@sakura-is.co.jp> Co-authored-by: HQidea <HQidea@users.noreply.github.com> Co-authored-by: Joshbly <36315710+Joshbly@users.noreply.github.com> Co-authored-by: xhe <xw897002528@gmail.com> Co-authored-by: weiwenyan-dev <154779315+weiwenyan-dev@users.noreply.github.com> Co-authored-by: ex_wenyan.wei <ex_wenyan.wei@tcl.com> Co-authored-by: engchina <12236799+engchina@users.noreply.github.com> Co-authored-by: engchina <atjapan2015@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: 呆萌闷油瓶 <253605712@qq.com> Co-authored-by: Kemal <kemalmeler@outlook.com> Co-authored-by: Lazy_Frog <4590648+lazyFrogLOL@users.noreply.github.com> Co-authored-by: Yi Xiao <54782454+YIXIAO0@users.noreply.github.com> Co-authored-by: Steven sun <98230804+Tuyohai@users.noreply.github.com> Co-authored-by: steven <sunzwj@digitalchina.com> Co-authored-by: Kalo Chin <91766386+fdb02983rhy@users.noreply.github.com> Co-authored-by: Katy Tao <34019945+KatyTao@users.noreply.github.com> Co-authored-by: depy <42985524+h4ckdepy@users.noreply.github.com> Co-authored-by: 胡春东 <gycm520@gmail.com> Co-authored-by: Junjie.M <118170653@qq.com> Co-authored-by: MuYu <mr.muzea@gmail.com> Co-authored-by: Naoki Takashima <39912547+takatea@users.noreply.github.com> Co-authored-by: Summer-Gu <37869445+gubinjie@users.noreply.github.com> Co-authored-by: Fei He <droxer.he@gmail.com> Co-authored-by: ybalbert001 <120714773+ybalbert001@users.noreply.github.com> Co-authored-by: Yuanbo Li <ybalbert@amazon.com> Co-authored-by: douxc <7553076+douxc@users.noreply.github.com> Co-authored-by: liuzhenghua <1090179900@qq.com> Co-authored-by: Wu Jiayang <62842862+Wu-Jiayang@users.noreply.github.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: kimjion <45935338+kimjion@users.noreply.github.com> Co-authored-by: AugNSo <song.tiankai@icloud.com> Co-authored-by: llinvokerl <38915183+llinvokerl@users.noreply.github.com> Co-authored-by: liusurong.lsr <liusurong.lsr@alibaba-inc.com> Co-authored-by: Vasu Negi <vasu-negi@users.noreply.github.com> Co-authored-by: Hundredwz <1808096180@qq.com> Co-authored-by: Xiyuan Chen <52963600+GareArc@users.noreply.github.com>
8 months ago
Introduce Plugins (#13836) Signed-off-by: yihong0618 <zouzou0208@gmail.com> Signed-off-by: -LAN- <laipz8200@outlook.com> Signed-off-by: xhe <xw897002528@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: takatost <takatost@gmail.com> Co-authored-by: kurokobo <kuro664@gmail.com> Co-authored-by: Novice Lee <novicelee@NoviPro.local> Co-authored-by: zxhlyh <jasonapring2015@outlook.com> Co-authored-by: AkaraChen <akarachen@outlook.com> Co-authored-by: Yi <yxiaoisme@gmail.com> Co-authored-by: Joel <iamjoel007@gmail.com> Co-authored-by: JzoNg <jzongcode@gmail.com> Co-authored-by: twwu <twwu@dify.ai> Co-authored-by: Hiroshi Fujita <fujita-h@users.noreply.github.com> Co-authored-by: AkaraChen <85140972+AkaraChen@users.noreply.github.com> Co-authored-by: NFish <douxc512@gmail.com> Co-authored-by: Wu Tianwei <30284043+WTW0313@users.noreply.github.com> Co-authored-by: 非法操作 <hjlarry@163.com> Co-authored-by: Novice <857526207@qq.com> Co-authored-by: Hiroki Nagai <82458324+nagaihiroki-git@users.noreply.github.com> Co-authored-by: Gen Sato <52241300+halogen22@users.noreply.github.com> Co-authored-by: eux <euxuuu@gmail.com> Co-authored-by: huangzhuo1949 <167434202+huangzhuo1949@users.noreply.github.com> Co-authored-by: huangzhuo <huangzhuo1@xiaomi.com> Co-authored-by: lotsik <lotsik@mail.ru> Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com> Co-authored-by: nite-knite <nkCoding@gmail.com> Co-authored-by: Jyong <76649700+JohnJyong@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: gakkiyomi <gakkiyomi@aliyun.com> Co-authored-by: CN-P5 <heibai2006@gmail.com> Co-authored-by: CN-P5 <heibai2006@qq.com> Co-authored-by: Chuehnone <1897025+chuehnone@users.noreply.github.com> Co-authored-by: yihong <zouzou0208@gmail.com> Co-authored-by: Kevin9703 <51311316+Kevin9703@users.noreply.github.com> Co-authored-by: -LAN- <laipz8200@outlook.com> Co-authored-by: Boris Feld <lothiraldan@gmail.com> Co-authored-by: mbo <himabo@gmail.com> Co-authored-by: mabo <mabo@aeyes.ai> Co-authored-by: Warren Chen <warren.chen830@gmail.com> Co-authored-by: JzoNgKVO <27049666+JzoNgKVO@users.noreply.github.com> Co-authored-by: jiandanfeng <chenjh3@wangsu.com> Co-authored-by: zhu-an <70234959+xhdd123321@users.noreply.github.com> Co-authored-by: zhaoqingyu.1075 <zhaoqingyu.1075@bytedance.com> Co-authored-by: 海狸大師 <86974027+yenslife@users.noreply.github.com> Co-authored-by: Xu Song <xusong.vip@gmail.com> Co-authored-by: rayshaw001 <396301947@163.com> Co-authored-by: Ding Jiatong <dingjiatong@gmail.com> Co-authored-by: Bowen Liang <liangbowen@gf.com.cn> Co-authored-by: JasonVV <jasonwangiii@outlook.com> Co-authored-by: le0zh <newlight@qq.com> Co-authored-by: zhuxinliang <zhuxinliang@didiglobal.com> Co-authored-by: k-zaku <zaku99@outlook.jp> Co-authored-by: luckylhb90 <luckylhb90@gmail.com> Co-authored-by: hobo.l <hobo.l@binance.com> Co-authored-by: jiangbo721 <365065261@qq.com> Co-authored-by: 刘江波 <jiangbo721@163.com> Co-authored-by: Shun Miyazawa <34241526+miya@users.noreply.github.com> Co-authored-by: EricPan <30651140+Egfly@users.noreply.github.com> Co-authored-by: crazywoola <427733928@qq.com> Co-authored-by: sino <sino2322@gmail.com> Co-authored-by: Jhvcc <37662342+Jhvcc@users.noreply.github.com> Co-authored-by: lowell <lowell.hu@zkteco.in> Co-authored-by: Boris Polonsky <BorisPolonsky@users.noreply.github.com> Co-authored-by: Ademílson Tonato <ademilsonft@outlook.com> Co-authored-by: Ademílson Tonato <ademilson.tonato@refurbed.com> Co-authored-by: IWAI, Masaharu <iwaim.sub@gmail.com> Co-authored-by: Yueh-Po Peng (Yabi) <94939112+y10ab1@users.noreply.github.com> Co-authored-by: Jason <ggbbddjm@gmail.com> Co-authored-by: Xin Zhang <sjhpzx@gmail.com> Co-authored-by: yjc980121 <3898524+yjc980121@users.noreply.github.com> Co-authored-by: heyszt <36215648+hieheihei@users.noreply.github.com> Co-authored-by: Abdullah AlOsaimi <osaimiacc@gmail.com> Co-authored-by: Abdullah AlOsaimi <189027247+osaimi@users.noreply.github.com> Co-authored-by: Yingchun Lai <laiyingchun@apache.org> Co-authored-by: Hash Brown <hi@xzd.me> Co-authored-by: zuodongxu <192560071+zuodongxu@users.noreply.github.com> Co-authored-by: Masashi Tomooka <tmokmss@users.noreply.github.com> Co-authored-by: aplio <ryo.091219@gmail.com> Co-authored-by: Obada Khalili <54270856+obadakhalili@users.noreply.github.com> Co-authored-by: Nam Vu <zuzoovn@gmail.com> Co-authored-by: Kei YAMAZAKI <1715090+kei-yamazaki@users.noreply.github.com> Co-authored-by: TechnoHouse <13776377+deephbz@users.noreply.github.com> Co-authored-by: Riddhimaan-Senapati <114703025+Riddhimaan-Senapati@users.noreply.github.com> Co-authored-by: MaFee921 <31881301+2284730142@users.noreply.github.com> Co-authored-by: te-chan <t-nakanome@sakura-is.co.jp> Co-authored-by: HQidea <HQidea@users.noreply.github.com> Co-authored-by: Joshbly <36315710+Joshbly@users.noreply.github.com> Co-authored-by: xhe <xw897002528@gmail.com> Co-authored-by: weiwenyan-dev <154779315+weiwenyan-dev@users.noreply.github.com> Co-authored-by: ex_wenyan.wei <ex_wenyan.wei@tcl.com> Co-authored-by: engchina <12236799+engchina@users.noreply.github.com> Co-authored-by: engchina <atjapan2015@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: 呆萌闷油瓶 <253605712@qq.com> Co-authored-by: Kemal <kemalmeler@outlook.com> Co-authored-by: Lazy_Frog <4590648+lazyFrogLOL@users.noreply.github.com> Co-authored-by: Yi Xiao <54782454+YIXIAO0@users.noreply.github.com> Co-authored-by: Steven sun <98230804+Tuyohai@users.noreply.github.com> Co-authored-by: steven <sunzwj@digitalchina.com> Co-authored-by: Kalo Chin <91766386+fdb02983rhy@users.noreply.github.com> Co-authored-by: Katy Tao <34019945+KatyTao@users.noreply.github.com> Co-authored-by: depy <42985524+h4ckdepy@users.noreply.github.com> Co-authored-by: 胡春东 <gycm520@gmail.com> Co-authored-by: Junjie.M <118170653@qq.com> Co-authored-by: MuYu <mr.muzea@gmail.com> Co-authored-by: Naoki Takashima <39912547+takatea@users.noreply.github.com> Co-authored-by: Summer-Gu <37869445+gubinjie@users.noreply.github.com> Co-authored-by: Fei He <droxer.he@gmail.com> Co-authored-by: ybalbert001 <120714773+ybalbert001@users.noreply.github.com> Co-authored-by: Yuanbo Li <ybalbert@amazon.com> Co-authored-by: douxc <7553076+douxc@users.noreply.github.com> Co-authored-by: liuzhenghua <1090179900@qq.com> Co-authored-by: Wu Jiayang <62842862+Wu-Jiayang@users.noreply.github.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: kimjion <45935338+kimjion@users.noreply.github.com> Co-authored-by: AugNSo <song.tiankai@icloud.com> Co-authored-by: llinvokerl <38915183+llinvokerl@users.noreply.github.com> Co-authored-by: liusurong.lsr <liusurong.lsr@alibaba-inc.com> Co-authored-by: Vasu Negi <vasu-negi@users.noreply.github.com> Co-authored-by: Hundredwz <1808096180@qq.com> Co-authored-by: Xiyuan Chen <52963600+GareArc@users.noreply.github.com>
8 months ago
Introduce Plugins (#13836) Signed-off-by: yihong0618 <zouzou0208@gmail.com> Signed-off-by: -LAN- <laipz8200@outlook.com> Signed-off-by: xhe <xw897002528@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: takatost <takatost@gmail.com> Co-authored-by: kurokobo <kuro664@gmail.com> Co-authored-by: Novice Lee <novicelee@NoviPro.local> Co-authored-by: zxhlyh <jasonapring2015@outlook.com> Co-authored-by: AkaraChen <akarachen@outlook.com> Co-authored-by: Yi <yxiaoisme@gmail.com> Co-authored-by: Joel <iamjoel007@gmail.com> Co-authored-by: JzoNg <jzongcode@gmail.com> Co-authored-by: twwu <twwu@dify.ai> Co-authored-by: Hiroshi Fujita <fujita-h@users.noreply.github.com> Co-authored-by: AkaraChen <85140972+AkaraChen@users.noreply.github.com> Co-authored-by: NFish <douxc512@gmail.com> Co-authored-by: Wu Tianwei <30284043+WTW0313@users.noreply.github.com> Co-authored-by: 非法操作 <hjlarry@163.com> Co-authored-by: Novice <857526207@qq.com> Co-authored-by: Hiroki Nagai <82458324+nagaihiroki-git@users.noreply.github.com> Co-authored-by: Gen Sato <52241300+halogen22@users.noreply.github.com> Co-authored-by: eux <euxuuu@gmail.com> Co-authored-by: huangzhuo1949 <167434202+huangzhuo1949@users.noreply.github.com> Co-authored-by: huangzhuo <huangzhuo1@xiaomi.com> Co-authored-by: lotsik <lotsik@mail.ru> Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com> Co-authored-by: nite-knite <nkCoding@gmail.com> Co-authored-by: Jyong <76649700+JohnJyong@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: gakkiyomi <gakkiyomi@aliyun.com> Co-authored-by: CN-P5 <heibai2006@gmail.com> Co-authored-by: CN-P5 <heibai2006@qq.com> Co-authored-by: Chuehnone <1897025+chuehnone@users.noreply.github.com> Co-authored-by: yihong <zouzou0208@gmail.com> Co-authored-by: Kevin9703 <51311316+Kevin9703@users.noreply.github.com> Co-authored-by: -LAN- <laipz8200@outlook.com> Co-authored-by: Boris Feld <lothiraldan@gmail.com> Co-authored-by: mbo <himabo@gmail.com> Co-authored-by: mabo <mabo@aeyes.ai> Co-authored-by: Warren Chen <warren.chen830@gmail.com> Co-authored-by: JzoNgKVO <27049666+JzoNgKVO@users.noreply.github.com> Co-authored-by: jiandanfeng <chenjh3@wangsu.com> Co-authored-by: zhu-an <70234959+xhdd123321@users.noreply.github.com> Co-authored-by: zhaoqingyu.1075 <zhaoqingyu.1075@bytedance.com> Co-authored-by: 海狸大師 <86974027+yenslife@users.noreply.github.com> Co-authored-by: Xu Song <xusong.vip@gmail.com> Co-authored-by: rayshaw001 <396301947@163.com> Co-authored-by: Ding Jiatong <dingjiatong@gmail.com> Co-authored-by: Bowen Liang <liangbowen@gf.com.cn> Co-authored-by: JasonVV <jasonwangiii@outlook.com> Co-authored-by: le0zh <newlight@qq.com> Co-authored-by: zhuxinliang <zhuxinliang@didiglobal.com> Co-authored-by: k-zaku <zaku99@outlook.jp> Co-authored-by: luckylhb90 <luckylhb90@gmail.com> Co-authored-by: hobo.l <hobo.l@binance.com> Co-authored-by: jiangbo721 <365065261@qq.com> Co-authored-by: 刘江波 <jiangbo721@163.com> Co-authored-by: Shun Miyazawa <34241526+miya@users.noreply.github.com> Co-authored-by: EricPan <30651140+Egfly@users.noreply.github.com> Co-authored-by: crazywoola <427733928@qq.com> Co-authored-by: sino <sino2322@gmail.com> Co-authored-by: Jhvcc <37662342+Jhvcc@users.noreply.github.com> Co-authored-by: lowell <lowell.hu@zkteco.in> Co-authored-by: Boris Polonsky <BorisPolonsky@users.noreply.github.com> Co-authored-by: Ademílson Tonato <ademilsonft@outlook.com> Co-authored-by: Ademílson Tonato <ademilson.tonato@refurbed.com> Co-authored-by: IWAI, Masaharu <iwaim.sub@gmail.com> Co-authored-by: Yueh-Po Peng (Yabi) <94939112+y10ab1@users.noreply.github.com> Co-authored-by: Jason <ggbbddjm@gmail.com> Co-authored-by: Xin Zhang <sjhpzx@gmail.com> Co-authored-by: yjc980121 <3898524+yjc980121@users.noreply.github.com> Co-authored-by: heyszt <36215648+hieheihei@users.noreply.github.com> Co-authored-by: Abdullah AlOsaimi <osaimiacc@gmail.com> Co-authored-by: Abdullah AlOsaimi <189027247+osaimi@users.noreply.github.com> Co-authored-by: Yingchun Lai <laiyingchun@apache.org> Co-authored-by: Hash Brown <hi@xzd.me> Co-authored-by: zuodongxu <192560071+zuodongxu@users.noreply.github.com> Co-authored-by: Masashi Tomooka <tmokmss@users.noreply.github.com> Co-authored-by: aplio <ryo.091219@gmail.com> Co-authored-by: Obada Khalili <54270856+obadakhalili@users.noreply.github.com> Co-authored-by: Nam Vu <zuzoovn@gmail.com> Co-authored-by: Kei YAMAZAKI <1715090+kei-yamazaki@users.noreply.github.com> Co-authored-by: TechnoHouse <13776377+deephbz@users.noreply.github.com> Co-authored-by: Riddhimaan-Senapati <114703025+Riddhimaan-Senapati@users.noreply.github.com> Co-authored-by: MaFee921 <31881301+2284730142@users.noreply.github.com> Co-authored-by: te-chan <t-nakanome@sakura-is.co.jp> Co-authored-by: HQidea <HQidea@users.noreply.github.com> Co-authored-by: Joshbly <36315710+Joshbly@users.noreply.github.com> Co-authored-by: xhe <xw897002528@gmail.com> Co-authored-by: weiwenyan-dev <154779315+weiwenyan-dev@users.noreply.github.com> Co-authored-by: ex_wenyan.wei <ex_wenyan.wei@tcl.com> Co-authored-by: engchina <12236799+engchina@users.noreply.github.com> Co-authored-by: engchina <atjapan2015@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: 呆萌闷油瓶 <253605712@qq.com> Co-authored-by: Kemal <kemalmeler@outlook.com> Co-authored-by: Lazy_Frog <4590648+lazyFrogLOL@users.noreply.github.com> Co-authored-by: Yi Xiao <54782454+YIXIAO0@users.noreply.github.com> Co-authored-by: Steven sun <98230804+Tuyohai@users.noreply.github.com> Co-authored-by: steven <sunzwj@digitalchina.com> Co-authored-by: Kalo Chin <91766386+fdb02983rhy@users.noreply.github.com> Co-authored-by: Katy Tao <34019945+KatyTao@users.noreply.github.com> Co-authored-by: depy <42985524+h4ckdepy@users.noreply.github.com> Co-authored-by: 胡春东 <gycm520@gmail.com> Co-authored-by: Junjie.M <118170653@qq.com> Co-authored-by: MuYu <mr.muzea@gmail.com> Co-authored-by: Naoki Takashima <39912547+takatea@users.noreply.github.com> Co-authored-by: Summer-Gu <37869445+gubinjie@users.noreply.github.com> Co-authored-by: Fei He <droxer.he@gmail.com> Co-authored-by: ybalbert001 <120714773+ybalbert001@users.noreply.github.com> Co-authored-by: Yuanbo Li <ybalbert@amazon.com> Co-authored-by: douxc <7553076+douxc@users.noreply.github.com> Co-authored-by: liuzhenghua <1090179900@qq.com> Co-authored-by: Wu Jiayang <62842862+Wu-Jiayang@users.noreply.github.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: kimjion <45935338+kimjion@users.noreply.github.com> Co-authored-by: AugNSo <song.tiankai@icloud.com> Co-authored-by: llinvokerl <38915183+llinvokerl@users.noreply.github.com> Co-authored-by: liusurong.lsr <liusurong.lsr@alibaba-inc.com> Co-authored-by: Vasu Negi <vasu-negi@users.noreply.github.com> Co-authored-by: Hundredwz <1808096180@qq.com> Co-authored-by: Xiyuan Chen <52963600+GareArc@users.noreply.github.com>
8 months ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086108710881089109010911092109310941095109610971098109911001101110211031104110511061107110811091110111111121113111411151116111711181119112011211122112311241125112611271128112911301131113211331134113511361137113811391140114111421143114411451146114711481149115011511152115311541155115611571158115911601161116211631164116511661167116811691170117111721173117411751176117711781179118011811182
  1. import json
  2. import math
  3. import re
  4. import threading
  5. from collections import Counter, defaultdict
  6. from collections.abc import Generator, Mapping
  7. from typing import Any, Optional, Union, cast
  8. from flask import Flask, current_app
  9. from sqlalchemy import Integer, and_, or_, text
  10. from sqlalchemy import cast as sqlalchemy_cast
  11. from core.app.app_config.entities import (
  12. DatasetEntity,
  13. DatasetRetrieveConfigEntity,
  14. MetadataFilteringCondition,
  15. ModelConfig,
  16. )
  17. from core.app.entities.app_invoke_entities import InvokeFrom, ModelConfigWithCredentialsEntity
  18. from core.callback_handler.index_tool_callback_handler import DatasetIndexToolCallbackHandler
  19. from core.entities.agent_entities import PlanningStrategy
  20. from core.entities.model_entities import ModelStatus
  21. from core.memory.token_buffer_memory import TokenBufferMemory
  22. from core.model_manager import ModelInstance, ModelManager
  23. from core.model_runtime.entities.llm_entities import LLMResult, LLMUsage
  24. from core.model_runtime.entities.message_entities import PromptMessage, PromptMessageRole, PromptMessageTool
  25. from core.model_runtime.entities.model_entities import ModelFeature, ModelType
  26. from core.model_runtime.model_providers.__base.large_language_model import LargeLanguageModel
  27. from core.ops.entities.trace_entity import TraceTaskName
  28. from core.ops.ops_trace_manager import TraceQueueManager, TraceTask
  29. from core.ops.utils import measure_time
  30. from core.prompt.advanced_prompt_transform import AdvancedPromptTransform
  31. from core.prompt.entities.advanced_prompt_entities import ChatModelMessage, CompletionModelPromptTemplate
  32. from core.prompt.simple_prompt_transform import ModelMode
  33. from core.rag.data_post_processor.data_post_processor import DataPostProcessor
  34. from core.rag.datasource.keyword.jieba.jieba_keyword_table_handler import JiebaKeywordTableHandler
  35. from core.rag.datasource.retrieval_service import RetrievalService
  36. from core.rag.entities.context_entities import DocumentContext
  37. from core.rag.entities.metadata_entities import Condition, MetadataCondition
  38. from core.rag.index_processor.constant.index_type import IndexType
  39. from core.rag.models.document import Document
  40. from core.rag.rerank.rerank_type import RerankMode
  41. from core.rag.retrieval.retrieval_methods import RetrievalMethod
  42. from core.rag.retrieval.router.multi_dataset_function_call_router import FunctionCallMultiDatasetRouter
  43. from core.rag.retrieval.router.multi_dataset_react_route import ReactMultiDatasetRouter
  44. from core.rag.retrieval.template_prompts import (
  45. METADATA_FILTER_ASSISTANT_PROMPT_1,
  46. METADATA_FILTER_ASSISTANT_PROMPT_2,
  47. METADATA_FILTER_COMPLETION_PROMPT,
  48. METADATA_FILTER_SYSTEM_PROMPT,
  49. METADATA_FILTER_USER_PROMPT_1,
  50. METADATA_FILTER_USER_PROMPT_2,
  51. METADATA_FILTER_USER_PROMPT_3,
  52. )
  53. from core.tools.utils.dataset_retriever.dataset_retriever_base_tool import DatasetRetrieverBaseTool
  54. from extensions.ext_database import db
  55. from libs.json_in_md_parser import parse_and_check_json_markdown
  56. from models.dataset import ChildChunk, Dataset, DatasetMetadata, DatasetQuery, DocumentSegment
  57. from models.dataset import Document as DatasetDocument
  58. from services.external_knowledge_service import ExternalDatasetService
  59. default_retrieval_model: dict[str, Any] = {
  60. "search_method": RetrievalMethod.SEMANTIC_SEARCH.value,
  61. "reranking_enable": False,
  62. "reranking_model": {"reranking_provider_name": "", "reranking_model_name": ""},
  63. "top_k": 2,
  64. "score_threshold_enabled": False,
  65. }
  66. class DatasetRetrieval:
  67. def __init__(self, application_generate_entity=None):
  68. self.application_generate_entity = application_generate_entity
  69. def retrieve(
  70. self,
  71. app_id: str,
  72. user_id: str,
  73. tenant_id: str,
  74. model_config: ModelConfigWithCredentialsEntity,
  75. config: DatasetEntity,
  76. query: str,
  77. invoke_from: InvokeFrom,
  78. show_retrieve_source: bool,
  79. hit_callback: DatasetIndexToolCallbackHandler,
  80. message_id: str,
  81. memory: Optional[TokenBufferMemory] = None,
  82. inputs: Optional[Mapping[str, Any]] = None,
  83. ) -> Optional[str]:
  84. """
  85. Retrieve dataset.
  86. :param app_id: app_id
  87. :param user_id: user_id
  88. :param tenant_id: tenant id
  89. :param model_config: model config
  90. :param config: dataset config
  91. :param query: query
  92. :param invoke_from: invoke from
  93. :param show_retrieve_source: show retrieve source
  94. :param hit_callback: hit callback
  95. :param message_id: message id
  96. :param memory: memory
  97. :param inputs: inputs
  98. :return:
  99. """
  100. dataset_ids = config.dataset_ids
  101. if len(dataset_ids) == 0:
  102. return None
  103. retrieve_config = config.retrieve_config
  104. # check model is support tool calling
  105. model_type_instance = model_config.provider_model_bundle.model_type_instance
  106. model_type_instance = cast(LargeLanguageModel, model_type_instance)
  107. model_manager = ModelManager()
  108. model_instance = model_manager.get_model_instance(
  109. tenant_id=tenant_id, model_type=ModelType.LLM, provider=model_config.provider, model=model_config.model
  110. )
  111. # get model schema
  112. model_schema = model_type_instance.get_model_schema(
  113. model=model_config.model, credentials=model_config.credentials
  114. )
  115. if not model_schema:
  116. return None
  117. planning_strategy = PlanningStrategy.REACT_ROUTER
  118. features = model_schema.features
  119. if features:
  120. if ModelFeature.TOOL_CALL in features or ModelFeature.MULTI_TOOL_CALL in features:
  121. planning_strategy = PlanningStrategy.ROUTER
  122. available_datasets = []
  123. for dataset_id in dataset_ids:
  124. # get dataset from dataset id
  125. dataset = db.session.query(Dataset).filter(Dataset.tenant_id == tenant_id, Dataset.id == dataset_id).first()
  126. # pass if dataset is not available
  127. if not dataset:
  128. continue
  129. # pass if dataset is not available
  130. if dataset and dataset.available_document_count == 0 and dataset.provider != "external":
  131. continue
  132. available_datasets.append(dataset)
  133. if inputs:
  134. inputs = {key: str(value) for key, value in inputs.items()}
  135. else:
  136. inputs = {}
  137. available_datasets_ids = [dataset.id for dataset in available_datasets]
  138. metadata_filter_document_ids, metadata_condition = self._get_metadata_filter_condition(
  139. available_datasets_ids,
  140. query,
  141. tenant_id,
  142. user_id,
  143. retrieve_config.metadata_filtering_mode, # type: ignore
  144. retrieve_config.metadata_model_config, # type: ignore
  145. retrieve_config.metadata_filtering_conditions,
  146. inputs,
  147. )
  148. all_documents = []
  149. user_from = "account" if invoke_from in {InvokeFrom.EXPLORE, InvokeFrom.DEBUGGER} else "end_user"
  150. if retrieve_config.retrieve_strategy == DatasetRetrieveConfigEntity.RetrieveStrategy.SINGLE:
  151. all_documents = self.single_retrieve(
  152. app_id,
  153. tenant_id,
  154. user_id,
  155. user_from,
  156. available_datasets,
  157. query,
  158. model_instance,
  159. model_config,
  160. planning_strategy,
  161. message_id,
  162. metadata_filter_document_ids,
  163. metadata_condition,
  164. )
  165. elif retrieve_config.retrieve_strategy == DatasetRetrieveConfigEntity.RetrieveStrategy.MULTIPLE:
  166. all_documents = self.multiple_retrieve(
  167. app_id,
  168. tenant_id,
  169. user_id,
  170. user_from,
  171. available_datasets,
  172. query,
  173. retrieve_config.top_k or 0,
  174. retrieve_config.score_threshold or 0,
  175. retrieve_config.rerank_mode or "reranking_model",
  176. retrieve_config.reranking_model,
  177. retrieve_config.weights,
  178. retrieve_config.reranking_enabled or True,
  179. message_id,
  180. metadata_filter_document_ids,
  181. metadata_condition,
  182. )
  183. dify_documents = [item for item in all_documents if item.provider == "dify"]
  184. external_documents = [item for item in all_documents if item.provider == "external"]
  185. document_context_list = []
  186. retrieval_resource_list = []
  187. # deal with external documents
  188. for item in external_documents:
  189. document_context_list.append(DocumentContext(content=item.page_content, score=item.metadata.get("score")))
  190. source = {
  191. "dataset_id": item.metadata.get("dataset_id"),
  192. "dataset_name": item.metadata.get("dataset_name"),
  193. "document_id": item.metadata.get("document_id") or item.metadata.get("title"),
  194. "document_name": item.metadata.get("title"),
  195. "data_source_type": "external",
  196. "retriever_from": invoke_from.to_source(),
  197. "score": item.metadata.get("score"),
  198. "content": item.page_content,
  199. }
  200. retrieval_resource_list.append(source)
  201. # deal with dify documents
  202. if dify_documents:
  203. records = RetrievalService.format_retrieval_documents(dify_documents)
  204. if records:
  205. for record in records:
  206. segment = record.segment
  207. if segment.answer:
  208. document_context_list.append(
  209. DocumentContext(
  210. content=f"question:{segment.get_sign_content()} answer:{segment.answer}",
  211. score=record.score,
  212. )
  213. )
  214. else:
  215. document_context_list.append(
  216. DocumentContext(
  217. content=segment.get_sign_content(),
  218. score=record.score,
  219. )
  220. )
  221. if show_retrieve_source:
  222. for record in records:
  223. segment = record.segment
  224. dataset = Dataset.query.filter_by(id=segment.dataset_id).first()
  225. document = DatasetDocument.query.filter(
  226. DatasetDocument.id == segment.document_id,
  227. DatasetDocument.enabled == True,
  228. DatasetDocument.archived == False,
  229. ).first()
  230. if dataset and document:
  231. source = {
  232. "dataset_id": dataset.id,
  233. "dataset_name": dataset.name,
  234. "document_id": document.id,
  235. "document_name": document.name,
  236. "data_source_type": document.data_source_type,
  237. "segment_id": segment.id,
  238. "retriever_from": invoke_from.to_source(),
  239. "score": record.score or 0.0,
  240. "doc_metadata": document.doc_metadata,
  241. }
  242. if invoke_from.to_source() == "dev":
  243. source["hit_count"] = segment.hit_count
  244. source["word_count"] = segment.word_count
  245. source["segment_position"] = segment.position
  246. source["index_node_hash"] = segment.index_node_hash
  247. if segment.answer:
  248. source["content"] = f"question:{segment.content} \nanswer:{segment.answer}"
  249. else:
  250. source["content"] = segment.content
  251. retrieval_resource_list.append(source)
  252. if hit_callback and retrieval_resource_list:
  253. retrieval_resource_list = sorted(retrieval_resource_list, key=lambda x: x.get("score") or 0.0, reverse=True)
  254. for position, item in enumerate(retrieval_resource_list, start=1):
  255. item["position"] = position
  256. hit_callback.return_retriever_resource_info(retrieval_resource_list)
  257. if document_context_list:
  258. document_context_list = sorted(document_context_list, key=lambda x: x.score or 0.0, reverse=True)
  259. return str("\n".join([document_context.content for document_context in document_context_list]))
  260. return ""
  261. def single_retrieve(
  262. self,
  263. app_id: str,
  264. tenant_id: str,
  265. user_id: str,
  266. user_from: str,
  267. available_datasets: list,
  268. query: str,
  269. model_instance: ModelInstance,
  270. model_config: ModelConfigWithCredentialsEntity,
  271. planning_strategy: PlanningStrategy,
  272. message_id: Optional[str] = None,
  273. metadata_filter_document_ids: Optional[dict[str, list[str]]] = None,
  274. metadata_condition: Optional[MetadataCondition] = None,
  275. ):
  276. tools = []
  277. for dataset in available_datasets:
  278. description = dataset.description
  279. if not description:
  280. description = "useful for when you want to answer queries about the " + dataset.name
  281. description = description.replace("\n", "").replace("\r", "")
  282. message_tool = PromptMessageTool(
  283. name=dataset.id,
  284. description=description,
  285. parameters={
  286. "type": "object",
  287. "properties": {},
  288. "required": [],
  289. },
  290. )
  291. tools.append(message_tool)
  292. dataset_id = None
  293. if planning_strategy == PlanningStrategy.REACT_ROUTER:
  294. react_multi_dataset_router = ReactMultiDatasetRouter()
  295. dataset_id = react_multi_dataset_router.invoke(
  296. query, tools, model_config, model_instance, user_id, tenant_id
  297. )
  298. elif planning_strategy == PlanningStrategy.ROUTER:
  299. function_call_router = FunctionCallMultiDatasetRouter()
  300. dataset_id = function_call_router.invoke(query, tools, model_config, model_instance)
  301. if dataset_id:
  302. # get retrieval model config
  303. dataset = db.session.query(Dataset).filter(Dataset.id == dataset_id).first()
  304. if dataset:
  305. results = []
  306. if dataset.provider == "external":
  307. external_documents = ExternalDatasetService.fetch_external_knowledge_retrieval(
  308. tenant_id=dataset.tenant_id,
  309. dataset_id=dataset_id,
  310. query=query,
  311. external_retrieval_parameters=dataset.retrieval_model,
  312. metadata_condition=metadata_condition,
  313. )
  314. for external_document in external_documents:
  315. document = Document(
  316. page_content=external_document.get("content"),
  317. metadata=external_document.get("metadata"),
  318. provider="external",
  319. )
  320. if document.metadata is not None:
  321. document.metadata["score"] = external_document.get("score")
  322. document.metadata["title"] = external_document.get("title")
  323. document.metadata["dataset_id"] = dataset_id
  324. document.metadata["dataset_name"] = dataset.name
  325. results.append(document)
  326. else:
  327. if metadata_condition and not metadata_filter_document_ids:
  328. return []
  329. document_ids_filter = None
  330. if metadata_filter_document_ids:
  331. document_ids = metadata_filter_document_ids.get(dataset.id, [])
  332. if document_ids:
  333. document_ids_filter = document_ids
  334. else:
  335. return []
  336. retrieval_model_config = dataset.retrieval_model or default_retrieval_model
  337. # get top k
  338. top_k = retrieval_model_config["top_k"]
  339. # get retrieval method
  340. if dataset.indexing_technique == "economy":
  341. retrieval_method = "keyword_search"
  342. else:
  343. retrieval_method = retrieval_model_config["search_method"]
  344. # get reranking model
  345. reranking_model = (
  346. retrieval_model_config["reranking_model"]
  347. if retrieval_model_config["reranking_enable"]
  348. else None
  349. )
  350. # get score threshold
  351. score_threshold = 0.0
  352. score_threshold_enabled = retrieval_model_config.get("score_threshold_enabled")
  353. if score_threshold_enabled:
  354. score_threshold = retrieval_model_config.get("score_threshold", 0.0)
  355. with measure_time() as timer:
  356. results = RetrievalService.retrieve(
  357. retrieval_method=retrieval_method,
  358. dataset_id=dataset.id,
  359. query=query,
  360. top_k=top_k,
  361. score_threshold=score_threshold,
  362. reranking_model=reranking_model,
  363. reranking_mode=retrieval_model_config.get("reranking_mode", "reranking_model"),
  364. weights=retrieval_model_config.get("weights", None),
  365. document_ids_filter=document_ids_filter,
  366. )
  367. self._on_query(query, [dataset_id], app_id, user_from, user_id)
  368. if results:
  369. self._on_retrieval_end(results, message_id, timer)
  370. return results
  371. return []
  372. def multiple_retrieve(
  373. self,
  374. app_id: str,
  375. tenant_id: str,
  376. user_id: str,
  377. user_from: str,
  378. available_datasets: list,
  379. query: str,
  380. top_k: int,
  381. score_threshold: float,
  382. reranking_mode: str,
  383. reranking_model: Optional[dict] = None,
  384. weights: Optional[dict[str, Any]] = None,
  385. reranking_enable: bool = True,
  386. message_id: Optional[str] = None,
  387. metadata_filter_document_ids: Optional[dict[str, list[str]]] = None,
  388. metadata_condition: Optional[MetadataCondition] = None,
  389. ):
  390. if not available_datasets:
  391. return []
  392. threads = []
  393. all_documents: list[Document] = []
  394. dataset_ids = [dataset.id for dataset in available_datasets]
  395. index_type_check = all(
  396. item.indexing_technique == available_datasets[0].indexing_technique for item in available_datasets
  397. )
  398. if not index_type_check and (not reranking_enable or reranking_mode != RerankMode.RERANKING_MODEL):
  399. raise ValueError(
  400. "The configured knowledge base list have different indexing technique, please set reranking model."
  401. )
  402. index_type = available_datasets[0].indexing_technique
  403. if index_type == "high_quality":
  404. embedding_model_check = all(
  405. item.embedding_model == available_datasets[0].embedding_model for item in available_datasets
  406. )
  407. embedding_model_provider_check = all(
  408. item.embedding_model_provider == available_datasets[0].embedding_model_provider
  409. for item in available_datasets
  410. )
  411. if (
  412. reranking_enable
  413. and reranking_mode == "weighted_score"
  414. and (not embedding_model_check or not embedding_model_provider_check)
  415. ):
  416. raise ValueError(
  417. "The configured knowledge base list have different embedding model, please set reranking model."
  418. )
  419. if reranking_enable and reranking_mode == RerankMode.WEIGHTED_SCORE:
  420. if weights is not None:
  421. weights["vector_setting"]["embedding_provider_name"] = available_datasets[
  422. 0
  423. ].embedding_model_provider
  424. weights["vector_setting"]["embedding_model_name"] = available_datasets[0].embedding_model
  425. for dataset in available_datasets:
  426. index_type = dataset.indexing_technique
  427. document_ids_filter = None
  428. if dataset.provider != "external":
  429. if metadata_condition and not metadata_filter_document_ids:
  430. continue
  431. if metadata_filter_document_ids:
  432. document_ids = metadata_filter_document_ids.get(dataset.id, [])
  433. if document_ids:
  434. document_ids_filter = document_ids
  435. else:
  436. continue
  437. retrieval_thread = threading.Thread(
  438. target=self._retriever,
  439. kwargs={
  440. "flask_app": current_app._get_current_object(), # type: ignore
  441. "dataset_id": dataset.id,
  442. "query": query,
  443. "top_k": top_k,
  444. "all_documents": all_documents,
  445. "document_ids_filter": document_ids_filter,
  446. "metadata_condition": metadata_condition,
  447. },
  448. )
  449. threads.append(retrieval_thread)
  450. retrieval_thread.start()
  451. for thread in threads:
  452. thread.join()
  453. with measure_time() as timer:
  454. if reranking_enable:
  455. # do rerank for searched documents
  456. data_post_processor = DataPostProcessor(tenant_id, reranking_mode, reranking_model, weights, False)
  457. all_documents = data_post_processor.invoke(
  458. query=query, documents=all_documents, score_threshold=score_threshold, top_n=top_k
  459. )
  460. else:
  461. if index_type == "economy":
  462. all_documents = self.calculate_keyword_score(query, all_documents, top_k)
  463. elif index_type == "high_quality":
  464. all_documents = self.calculate_vector_score(all_documents, top_k, score_threshold)
  465. self._on_query(query, dataset_ids, app_id, user_from, user_id)
  466. if all_documents:
  467. self._on_retrieval_end(all_documents, message_id, timer)
  468. return all_documents
  469. def _on_retrieval_end(
  470. self, documents: list[Document], message_id: Optional[str] = None, timer: Optional[dict] = None
  471. ) -> None:
  472. """Handle retrieval end."""
  473. dify_documents = [document for document in documents if document.provider == "dify"]
  474. for document in dify_documents:
  475. if document.metadata is not None:
  476. dataset_document = DatasetDocument.query.filter(
  477. DatasetDocument.id == document.metadata["document_id"]
  478. ).first()
  479. if dataset_document:
  480. if dataset_document.doc_form == IndexType.PARENT_CHILD_INDEX:
  481. child_chunk = ChildChunk.query.filter(
  482. ChildChunk.index_node_id == document.metadata["doc_id"],
  483. ChildChunk.dataset_id == dataset_document.dataset_id,
  484. ChildChunk.document_id == dataset_document.id,
  485. ).first()
  486. if child_chunk:
  487. segment = DocumentSegment.query.filter(DocumentSegment.id == child_chunk.segment_id).update(
  488. {DocumentSegment.hit_count: DocumentSegment.hit_count + 1}, synchronize_session=False
  489. )
  490. db.session.commit()
  491. else:
  492. query = db.session.query(DocumentSegment).filter(
  493. DocumentSegment.index_node_id == document.metadata["doc_id"]
  494. )
  495. # if 'dataset_id' in document.metadata:
  496. if "dataset_id" in document.metadata:
  497. query = query.filter(DocumentSegment.dataset_id == document.metadata["dataset_id"])
  498. # add hit count to document segment
  499. query.update(
  500. {DocumentSegment.hit_count: DocumentSegment.hit_count + 1}, synchronize_session=False
  501. )
  502. db.session.commit()
  503. # get tracing instance
  504. trace_manager: TraceQueueManager | None = (
  505. self.application_generate_entity.trace_manager if self.application_generate_entity else None
  506. )
  507. if trace_manager:
  508. trace_manager.add_trace_task(
  509. TraceTask(
  510. TraceTaskName.DATASET_RETRIEVAL_TRACE, message_id=message_id, documents=documents, timer=timer
  511. )
  512. )
  513. def _on_query(self, query: str, dataset_ids: list[str], app_id: str, user_from: str, user_id: str) -> None:
  514. """
  515. Handle query.
  516. """
  517. if not query:
  518. return
  519. dataset_queries = []
  520. for dataset_id in dataset_ids:
  521. dataset_query = DatasetQuery(
  522. dataset_id=dataset_id,
  523. content=query,
  524. source="app",
  525. source_app_id=app_id,
  526. created_by_role=user_from,
  527. created_by=user_id,
  528. )
  529. dataset_queries.append(dataset_query)
  530. if dataset_queries:
  531. db.session.add_all(dataset_queries)
  532. db.session.commit()
  533. def _retriever(
  534. self,
  535. flask_app: Flask,
  536. dataset_id: str,
  537. query: str,
  538. top_k: int,
  539. all_documents: list,
  540. document_ids_filter: Optional[list[str]] = None,
  541. metadata_condition: Optional[MetadataCondition] = None,
  542. ):
  543. with flask_app.app_context():
  544. dataset = db.session.query(Dataset).filter(Dataset.id == dataset_id).first()
  545. if not dataset:
  546. return []
  547. if dataset.provider == "external":
  548. external_documents = ExternalDatasetService.fetch_external_knowledge_retrieval(
  549. tenant_id=dataset.tenant_id,
  550. dataset_id=dataset_id,
  551. query=query,
  552. external_retrieval_parameters=dataset.retrieval_model,
  553. metadata_condition=metadata_condition,
  554. )
  555. for external_document in external_documents:
  556. document = Document(
  557. page_content=external_document.get("content"),
  558. metadata=external_document.get("metadata"),
  559. provider="external",
  560. )
  561. if document.metadata is not None:
  562. document.metadata["score"] = external_document.get("score")
  563. document.metadata["title"] = external_document.get("title")
  564. document.metadata["dataset_id"] = dataset_id
  565. document.metadata["dataset_name"] = dataset.name
  566. all_documents.append(document)
  567. else:
  568. # get retrieval model , if the model is not setting , using default
  569. retrieval_model = dataset.retrieval_model or default_retrieval_model
  570. if dataset.indexing_technique == "economy":
  571. # use keyword table query
  572. documents = RetrievalService.retrieve(
  573. retrieval_method="keyword_search",
  574. dataset_id=dataset.id,
  575. query=query,
  576. top_k=top_k,
  577. document_ids_filter=document_ids_filter,
  578. )
  579. if documents:
  580. all_documents.extend(documents)
  581. else:
  582. if top_k > 0:
  583. # retrieval source
  584. documents = RetrievalService.retrieve(
  585. retrieval_method=retrieval_model["search_method"],
  586. dataset_id=dataset.id,
  587. query=query,
  588. top_k=retrieval_model.get("top_k") or 2,
  589. score_threshold=retrieval_model.get("score_threshold", 0.0)
  590. if retrieval_model["score_threshold_enabled"]
  591. else 0.0,
  592. reranking_model=retrieval_model.get("reranking_model", None)
  593. if retrieval_model["reranking_enable"]
  594. else None,
  595. reranking_mode=retrieval_model.get("reranking_mode") or "reranking_model",
  596. weights=retrieval_model.get("weights", None),
  597. document_ids_filter=document_ids_filter,
  598. )
  599. all_documents.extend(documents)
  600. def to_dataset_retriever_tool(
  601. self,
  602. tenant_id: str,
  603. dataset_ids: list[str],
  604. retrieve_config: DatasetRetrieveConfigEntity,
  605. return_resource: bool,
  606. invoke_from: InvokeFrom,
  607. hit_callback: DatasetIndexToolCallbackHandler,
  608. ) -> Optional[list[DatasetRetrieverBaseTool]]:
  609. """
  610. A dataset tool is a tool that can be used to retrieve information from a dataset
  611. :param tenant_id: tenant id
  612. :param dataset_ids: dataset ids
  613. :param retrieve_config: retrieve config
  614. :param return_resource: return resource
  615. :param invoke_from: invoke from
  616. :param hit_callback: hit callback
  617. """
  618. tools = []
  619. available_datasets = []
  620. for dataset_id in dataset_ids:
  621. # get dataset from dataset id
  622. dataset = db.session.query(Dataset).filter(Dataset.tenant_id == tenant_id, Dataset.id == dataset_id).first()
  623. # pass if dataset is not available
  624. if not dataset:
  625. continue
  626. # pass if dataset is not available
  627. if dataset and dataset.provider != "external" and dataset.available_document_count == 0:
  628. continue
  629. available_datasets.append(dataset)
  630. if retrieve_config.retrieve_strategy == DatasetRetrieveConfigEntity.RetrieveStrategy.SINGLE:
  631. # get retrieval model config
  632. default_retrieval_model = {
  633. "search_method": RetrievalMethod.SEMANTIC_SEARCH.value,
  634. "reranking_enable": False,
  635. "reranking_model": {"reranking_provider_name": "", "reranking_model_name": ""},
  636. "top_k": 2,
  637. "score_threshold_enabled": False,
  638. }
  639. for dataset in available_datasets:
  640. retrieval_model_config = dataset.retrieval_model or default_retrieval_model
  641. # get top k
  642. top_k = retrieval_model_config["top_k"]
  643. # get score threshold
  644. score_threshold = None
  645. score_threshold_enabled = retrieval_model_config.get("score_threshold_enabled")
  646. if score_threshold_enabled:
  647. score_threshold = retrieval_model_config.get("score_threshold")
  648. from core.tools.utils.dataset_retriever.dataset_retriever_tool import DatasetRetrieverTool
  649. tool = DatasetRetrieverTool.from_dataset(
  650. dataset=dataset,
  651. top_k=top_k,
  652. score_threshold=score_threshold,
  653. hit_callbacks=[hit_callback],
  654. return_resource=return_resource,
  655. retriever_from=invoke_from.to_source(),
  656. )
  657. tools.append(tool)
  658. elif retrieve_config.retrieve_strategy == DatasetRetrieveConfigEntity.RetrieveStrategy.MULTIPLE:
  659. from core.tools.utils.dataset_retriever.dataset_multi_retriever_tool import DatasetMultiRetrieverTool
  660. if retrieve_config.reranking_model is None:
  661. raise ValueError("Reranking model is required for multiple retrieval")
  662. tool = DatasetMultiRetrieverTool.from_dataset(
  663. dataset_ids=[dataset.id for dataset in available_datasets],
  664. tenant_id=tenant_id,
  665. top_k=retrieve_config.top_k or 2,
  666. score_threshold=retrieve_config.score_threshold,
  667. hit_callbacks=[hit_callback],
  668. return_resource=return_resource,
  669. retriever_from=invoke_from.to_source(),
  670. reranking_provider_name=retrieve_config.reranking_model.get("reranking_provider_name"),
  671. reranking_model_name=retrieve_config.reranking_model.get("reranking_model_name"),
  672. )
  673. tools.append(tool)
  674. return tools
  675. def calculate_keyword_score(self, query: str, documents: list[Document], top_k: int) -> list[Document]:
  676. """
  677. Calculate keywords scores
  678. :param query: search query
  679. :param documents: documents for reranking
  680. :param top_k: top k
  681. :return:
  682. """
  683. keyword_table_handler = JiebaKeywordTableHandler()
  684. query_keywords = keyword_table_handler.extract_keywords(query, None)
  685. documents_keywords = []
  686. for document in documents:
  687. if document.metadata is not None:
  688. # get the document keywords
  689. document_keywords = keyword_table_handler.extract_keywords(document.page_content, None)
  690. document.metadata["keywords"] = document_keywords
  691. documents_keywords.append(document_keywords)
  692. # Counter query keywords(TF)
  693. query_keyword_counts = Counter(query_keywords)
  694. # total documents
  695. total_documents = len(documents)
  696. # calculate all documents' keywords IDF
  697. all_keywords = set()
  698. for document_keywords in documents_keywords:
  699. all_keywords.update(document_keywords)
  700. keyword_idf = {}
  701. for keyword in all_keywords:
  702. # calculate include query keywords' documents
  703. doc_count_containing_keyword = sum(1 for doc_keywords in documents_keywords if keyword in doc_keywords)
  704. # IDF
  705. keyword_idf[keyword] = math.log((1 + total_documents) / (1 + doc_count_containing_keyword)) + 1
  706. query_tfidf = {}
  707. for keyword, count in query_keyword_counts.items():
  708. tf = count
  709. idf = keyword_idf.get(keyword, 0)
  710. query_tfidf[keyword] = tf * idf
  711. # calculate all documents' TF-IDF
  712. documents_tfidf = []
  713. for document_keywords in documents_keywords:
  714. document_keyword_counts = Counter(document_keywords)
  715. document_tfidf = {}
  716. for keyword, count in document_keyword_counts.items():
  717. tf = count
  718. idf = keyword_idf.get(keyword, 0)
  719. document_tfidf[keyword] = tf * idf
  720. documents_tfidf.append(document_tfidf)
  721. def cosine_similarity(vec1, vec2):
  722. intersection = set(vec1.keys()) & set(vec2.keys())
  723. numerator = sum(vec1[x] * vec2[x] for x in intersection)
  724. sum1 = sum(vec1[x] ** 2 for x in vec1)
  725. sum2 = sum(vec2[x] ** 2 for x in vec2)
  726. denominator = math.sqrt(sum1) * math.sqrt(sum2)
  727. if not denominator:
  728. return 0.0
  729. else:
  730. return float(numerator) / denominator
  731. similarities = []
  732. for document_tfidf in documents_tfidf:
  733. similarity = cosine_similarity(query_tfidf, document_tfidf)
  734. similarities.append(similarity)
  735. for document, score in zip(documents, similarities):
  736. # format document
  737. if document.metadata is not None:
  738. document.metadata["score"] = score
  739. documents = sorted(documents, key=lambda x: x.metadata.get("score", 0) if x.metadata else 0, reverse=True)
  740. return documents[:top_k] if top_k else documents
  741. def calculate_vector_score(
  742. self, all_documents: list[Document], top_k: int, score_threshold: float
  743. ) -> list[Document]:
  744. filter_documents = []
  745. for document in all_documents:
  746. if score_threshold is None or (document.metadata and document.metadata.get("score", 0) >= score_threshold):
  747. filter_documents.append(document)
  748. if not filter_documents:
  749. return []
  750. filter_documents = sorted(
  751. filter_documents, key=lambda x: x.metadata.get("score", 0) if x.metadata else 0, reverse=True
  752. )
  753. return filter_documents[:top_k] if top_k else filter_documents
  754. def _get_metadata_filter_condition(
  755. self,
  756. dataset_ids: list,
  757. query: str,
  758. tenant_id: str,
  759. user_id: str,
  760. metadata_filtering_mode: str,
  761. metadata_model_config: ModelConfig,
  762. metadata_filtering_conditions: Optional[MetadataFilteringCondition],
  763. inputs: dict,
  764. ) -> tuple[Optional[dict[str, list[str]]], Optional[MetadataCondition]]:
  765. document_query = db.session.query(DatasetDocument).filter(
  766. DatasetDocument.dataset_id.in_(dataset_ids),
  767. DatasetDocument.indexing_status == "completed",
  768. DatasetDocument.enabled == True,
  769. DatasetDocument.archived == False,
  770. )
  771. filters = [] # type: ignore
  772. metadata_condition = None
  773. if metadata_filtering_mode == "disabled":
  774. return None, None
  775. elif metadata_filtering_mode == "automatic":
  776. automatic_metadata_filters = self._automatic_metadata_filter_func(
  777. dataset_ids, query, tenant_id, user_id, metadata_model_config
  778. )
  779. if automatic_metadata_filters:
  780. conditions = []
  781. for sequence, filter in enumerate(automatic_metadata_filters):
  782. self._process_metadata_filter_func(
  783. sequence,
  784. filter.get("condition"), # type: ignore
  785. filter.get("metadata_name"), # type: ignore
  786. filter.get("value"),
  787. filters, # type: ignore
  788. )
  789. conditions.append(
  790. Condition(
  791. name=filter.get("metadata_name"), # type: ignore
  792. comparison_operator=filter.get("condition"), # type: ignore
  793. value=filter.get("value"),
  794. )
  795. )
  796. metadata_condition = MetadataCondition(
  797. logical_operator=metadata_filtering_conditions.logical_operator
  798. if metadata_filtering_conditions
  799. else "or", # type: ignore
  800. conditions=conditions,
  801. )
  802. elif metadata_filtering_mode == "manual":
  803. if metadata_filtering_conditions:
  804. metadata_condition = MetadataCondition(**metadata_filtering_conditions.model_dump())
  805. for sequence, condition in enumerate(metadata_filtering_conditions.conditions): # type: ignore
  806. metadata_name = condition.name
  807. expected_value = condition.value
  808. if expected_value is not None or condition.comparison_operator in ("empty", "not empty"):
  809. if isinstance(expected_value, str):
  810. expected_value = self._replace_metadata_filter_value(expected_value, inputs)
  811. filters = self._process_metadata_filter_func(
  812. sequence,
  813. condition.comparison_operator,
  814. metadata_name,
  815. expected_value,
  816. filters,
  817. )
  818. else:
  819. raise ValueError("Invalid metadata filtering mode")
  820. if filters:
  821. if metadata_filtering_conditions and metadata_filtering_conditions.logical_operator == "and": # type: ignore
  822. document_query = document_query.filter(and_(*filters))
  823. else:
  824. document_query = document_query.filter(or_(*filters))
  825. documents = document_query.all()
  826. # group by dataset_id
  827. metadata_filter_document_ids = defaultdict(list) if documents else None # type: ignore
  828. for document in documents:
  829. metadata_filter_document_ids[document.dataset_id].append(document.id) # type: ignore
  830. return metadata_filter_document_ids, metadata_condition
  831. def _replace_metadata_filter_value(self, text: str, inputs: dict) -> str:
  832. def replacer(match):
  833. key = match.group(1)
  834. return str(inputs.get(key, f"{{{{{key}}}}}"))
  835. pattern = re.compile(r"\{\{(\w+)\}\}")
  836. output = pattern.sub(replacer, text)
  837. if isinstance(output, str):
  838. output = re.sub(r"[\r\n\t]+", " ", output).strip()
  839. return output
  840. def _automatic_metadata_filter_func(
  841. self, dataset_ids: list, query: str, tenant_id: str, user_id: str, metadata_model_config: ModelConfig
  842. ) -> Optional[list[dict[str, Any]]]:
  843. # get all metadata field
  844. metadata_fields = db.session.query(DatasetMetadata).filter(DatasetMetadata.dataset_id.in_(dataset_ids)).all()
  845. all_metadata_fields = [metadata_field.name for metadata_field in metadata_fields]
  846. # get metadata model config
  847. if metadata_model_config is None:
  848. raise ValueError("metadata_model_config is required")
  849. # get metadata model instance
  850. # fetch model config
  851. model_instance, model_config = self._fetch_model_config(tenant_id, metadata_model_config)
  852. # fetch prompt messages
  853. prompt_messages, stop = self._get_prompt_template(
  854. model_config=model_config,
  855. mode=metadata_model_config.mode,
  856. metadata_fields=all_metadata_fields,
  857. query=query or "",
  858. )
  859. result_text = ""
  860. try:
  861. # handle invoke result
  862. invoke_result = cast(
  863. Generator[LLMResult, None, None],
  864. model_instance.invoke_llm(
  865. prompt_messages=prompt_messages,
  866. model_parameters=model_config.parameters,
  867. stop=stop,
  868. stream=True,
  869. user=user_id,
  870. ),
  871. )
  872. # handle invoke result
  873. result_text, usage = self._handle_invoke_result(invoke_result=invoke_result)
  874. result_text_json = parse_and_check_json_markdown(result_text, [])
  875. automatic_metadata_filters = []
  876. if "metadata_map" in result_text_json:
  877. metadata_map = result_text_json["metadata_map"]
  878. for item in metadata_map:
  879. if item.get("metadata_field_name") in all_metadata_fields:
  880. automatic_metadata_filters.append(
  881. {
  882. "metadata_name": item.get("metadata_field_name"),
  883. "value": item.get("metadata_field_value"),
  884. "condition": item.get("comparison_operator"),
  885. }
  886. )
  887. except Exception as e:
  888. return None
  889. return automatic_metadata_filters
  890. def _process_metadata_filter_func(
  891. self, sequence: int, condition: str, metadata_name: str, value: Optional[Any], filters: list
  892. ):
  893. key = f"{metadata_name}_{sequence}"
  894. key_value = f"{metadata_name}_{sequence}_value"
  895. match condition:
  896. case "contains":
  897. filters.append(
  898. (text(f"documents.doc_metadata ->> :{key} LIKE :{key_value}")).params(
  899. **{key: metadata_name, key_value: f"%{value}%"}
  900. )
  901. )
  902. case "not contains":
  903. filters.append(
  904. (text(f"documents.doc_metadata ->> :{key} NOT LIKE :{key_value}")).params(
  905. **{key: metadata_name, key_value: f"%{value}%"}
  906. )
  907. )
  908. case "start with":
  909. filters.append(
  910. (text(f"documents.doc_metadata ->> :{key} LIKE :{key_value}")).params(
  911. **{key: metadata_name, key_value: f"{value}%"}
  912. )
  913. )
  914. case "end with":
  915. filters.append(
  916. (text(f"documents.doc_metadata ->> :{key} LIKE :{key_value}")).params(
  917. **{key: metadata_name, key_value: f"%{value}"}
  918. )
  919. )
  920. case "is" | "=":
  921. if isinstance(value, str):
  922. filters.append(DatasetDocument.doc_metadata[metadata_name] == f'"{value}"')
  923. else:
  924. filters.append(
  925. sqlalchemy_cast(DatasetDocument.doc_metadata[metadata_name].astext, Integer) == value
  926. )
  927. case "is not" | "≠":
  928. if isinstance(value, str):
  929. filters.append(DatasetDocument.doc_metadata[metadata_name] != f'"{value}"')
  930. else:
  931. filters.append(
  932. sqlalchemy_cast(DatasetDocument.doc_metadata[metadata_name].astext, Integer) != value
  933. )
  934. case "empty":
  935. filters.append(DatasetDocument.doc_metadata[metadata_name].is_(None))
  936. case "not empty":
  937. filters.append(DatasetDocument.doc_metadata[metadata_name].isnot(None))
  938. case "before" | "<":
  939. filters.append(sqlalchemy_cast(DatasetDocument.doc_metadata[metadata_name].astext, Integer) < value)
  940. case "after" | ">":
  941. filters.append(sqlalchemy_cast(DatasetDocument.doc_metadata[metadata_name].astext, Integer) > value)
  942. case "≤" | "<=":
  943. filters.append(sqlalchemy_cast(DatasetDocument.doc_metadata[metadata_name].astext, Integer) <= value)
  944. case "≥" | ">=":
  945. filters.append(sqlalchemy_cast(DatasetDocument.doc_metadata[metadata_name].astext, Integer) >= value)
  946. case _:
  947. pass
  948. return filters
  949. def _fetch_model_config(
  950. self, tenant_id: str, model: ModelConfig
  951. ) -> tuple[ModelInstance, ModelConfigWithCredentialsEntity]:
  952. """
  953. Fetch model config
  954. """
  955. if model is None:
  956. raise ValueError("single_retrieval_config is required")
  957. model_name = model.name
  958. provider_name = model.provider
  959. model_manager = ModelManager()
  960. model_instance = model_manager.get_model_instance(
  961. tenant_id=tenant_id, model_type=ModelType.LLM, provider=provider_name, model=model_name
  962. )
  963. provider_model_bundle = model_instance.provider_model_bundle
  964. model_type_instance = model_instance.model_type_instance
  965. model_type_instance = cast(LargeLanguageModel, model_type_instance)
  966. model_credentials = model_instance.credentials
  967. # check model
  968. provider_model = provider_model_bundle.configuration.get_provider_model(
  969. model=model_name, model_type=ModelType.LLM
  970. )
  971. if provider_model is None:
  972. raise ValueError(f"Model {model_name} not exist.")
  973. if provider_model.status == ModelStatus.NO_CONFIGURE:
  974. raise ValueError(f"Model {model_name} credentials is not initialized.")
  975. elif provider_model.status == ModelStatus.NO_PERMISSION:
  976. raise ValueError(f"Dify Hosted OpenAI {model_name} currently not support.")
  977. elif provider_model.status == ModelStatus.QUOTA_EXCEEDED:
  978. raise ValueError(f"Model provider {provider_name} quota exceeded.")
  979. # model config
  980. completion_params = model.completion_params
  981. stop = []
  982. if "stop" in completion_params:
  983. stop = completion_params["stop"]
  984. del completion_params["stop"]
  985. # get model mode
  986. model_mode = model.mode
  987. if not model_mode:
  988. raise ValueError("LLM mode is required.")
  989. model_schema = model_type_instance.get_model_schema(model_name, model_credentials)
  990. if not model_schema:
  991. raise ValueError(f"Model {model_name} not exist.")
  992. return model_instance, ModelConfigWithCredentialsEntity(
  993. provider=provider_name,
  994. model=model_name,
  995. model_schema=model_schema,
  996. mode=model_mode,
  997. provider_model_bundle=provider_model_bundle,
  998. credentials=model_credentials,
  999. parameters=completion_params,
  1000. stop=stop,
  1001. )
  1002. def _get_prompt_template(
  1003. self, model_config: ModelConfigWithCredentialsEntity, mode: str, metadata_fields: list, query: str
  1004. ):
  1005. model_mode = ModelMode.value_of(mode)
  1006. input_text = query
  1007. prompt_template: Union[CompletionModelPromptTemplate, list[ChatModelMessage]]
  1008. if model_mode == ModelMode.CHAT:
  1009. prompt_template = []
  1010. system_prompt_messages = ChatModelMessage(role=PromptMessageRole.SYSTEM, text=METADATA_FILTER_SYSTEM_PROMPT)
  1011. prompt_template.append(system_prompt_messages)
  1012. user_prompt_message_1 = ChatModelMessage(role=PromptMessageRole.USER, text=METADATA_FILTER_USER_PROMPT_1)
  1013. prompt_template.append(user_prompt_message_1)
  1014. assistant_prompt_message_1 = ChatModelMessage(
  1015. role=PromptMessageRole.ASSISTANT, text=METADATA_FILTER_ASSISTANT_PROMPT_1
  1016. )
  1017. prompt_template.append(assistant_prompt_message_1)
  1018. user_prompt_message_2 = ChatModelMessage(role=PromptMessageRole.USER, text=METADATA_FILTER_USER_PROMPT_2)
  1019. prompt_template.append(user_prompt_message_2)
  1020. assistant_prompt_message_2 = ChatModelMessage(
  1021. role=PromptMessageRole.ASSISTANT, text=METADATA_FILTER_ASSISTANT_PROMPT_2
  1022. )
  1023. prompt_template.append(assistant_prompt_message_2)
  1024. user_prompt_message_3 = ChatModelMessage(
  1025. role=PromptMessageRole.USER,
  1026. text=METADATA_FILTER_USER_PROMPT_3.format(
  1027. input_text=input_text,
  1028. metadata_fields=json.dumps(metadata_fields, ensure_ascii=False),
  1029. ),
  1030. )
  1031. prompt_template.append(user_prompt_message_3)
  1032. elif model_mode == ModelMode.COMPLETION:
  1033. prompt_template = CompletionModelPromptTemplate(
  1034. text=METADATA_FILTER_COMPLETION_PROMPT.format(
  1035. input_text=input_text,
  1036. metadata_fields=json.dumps(metadata_fields, ensure_ascii=False),
  1037. )
  1038. )
  1039. else:
  1040. raise ValueError(f"Model mode {model_mode} not support.")
  1041. prompt_transform = AdvancedPromptTransform()
  1042. prompt_messages = prompt_transform.get_prompt(
  1043. prompt_template=prompt_template,
  1044. inputs={},
  1045. query=query or "",
  1046. files=[],
  1047. context=None,
  1048. memory_config=None,
  1049. memory=None,
  1050. model_config=model_config,
  1051. )
  1052. stop = model_config.stop
  1053. return prompt_messages, stop
  1054. def _handle_invoke_result(self, invoke_result: Generator) -> tuple[str, LLMUsage]:
  1055. """
  1056. Handle invoke result
  1057. :param invoke_result: invoke result
  1058. :return:
  1059. """
  1060. model = None
  1061. prompt_messages: list[PromptMessage] = []
  1062. full_text = ""
  1063. usage = None
  1064. for result in invoke_result:
  1065. text = result.delta.message.content
  1066. full_text += text
  1067. if not model:
  1068. model = result.model
  1069. if not prompt_messages:
  1070. prompt_messages = result.prompt_messages
  1071. if not usage and result.delta.usage:
  1072. usage = result.delta.usage
  1073. if not usage:
  1074. usage = LLMUsage.empty_usage()
  1075. return full_text, usage